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Abstract 



We propose a novel algebraic algorithmic framework for dealing with probability distri- 
butions represented by their cumulants such as the mean and covariance matrix. As an 
example, we consider the unsupervised learning problem of finding the subspace on which 
several probability distributions agree. Instead of minimizing an objective function involv- 
ing the estimated cumulants, we show that by treating the cumulants as elements of the 
polynomial ring we can directly solve the problem, at a lower computational cost and with 
higher accuracy. Moreover, the algebraic viewpoint on probability distributions allows us 
to invoke the theory of algebraic geometry, which we demonstrate in a compact proof for 
an identifiability criterion. 



Keywords: Computational algebraic geometry, Approximate algebra, Unsupervised 
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1. Introduction 

Comparing high dimensional probability distributions is a general problem in machine learn- 



ing, which occurs in two-sample testing (e.g. Hotelling (1932); Gretton et al. (2007)), pro 



jection pursuit (e.g. Friedman and Tukey (1974)), dimensionality reduction and feature se- 
lection (e.g. Torkkola (2003)). Under mild assumptions, probability densities are uniquely 



determined by their cumulants which are naturally interpreted as coefficients of homoge- 
neous multivariate polynomials. Representing probability densities in terms of cumulants 
is a standard technique in learning algorithms. For example, in Fisher Discriminant Anal- 



ysis (Fisher, 1936), the class conditional distributions are approximated by their first two 



cumulants. 

In this paper, we take this viewpoint further and work explicitly with polynomials. That 
is, we treat estimated cumulants not as constants in an objective function, but as objects 
that we manipulate algebraically in order to find the optimal solution. As an example, we 
consider the problem of finding the linear subspace on which several probability distributions 
are identical: given D-variate random variables X±, . . . , X m , we want to find the linear map 
P £ Tsl dxD such that the projected random variables have the same probability distribution, 



PX 1 



PX r , 



This amounts to finding the directions on which all projected cumulants agree. For the 
first cumulant, the mean, the projection is readily available as the solution of a set of linear 
equations. For higher order cumulants, we need to solve polynomial equations of higher 
degree. We present the first algorithm that solves this problem explicitly for arbitrary 
degree, and show how algebraic geometry can be applied to prove properties about it. 

To clarify the gist of our approach, let us consider a stylized example. In order to solve a 
learning problem, the conventional approach in machine learning is to formulate an objective 
function, e.g. the log likelihood of the data or the empirical risk. Instead of minimizing an 
objective function that involves the polynomials, we consider the polynomials as objects in 
their own right and then solve the problem by algebraic manipulations. The advantage of 
the algebraic approach is that it captures the inherent structure of the problem, which is 
in general difficult to model in an optimization approach. In other words, the algebraic 
approach actually solves the problem, whereas optimization searches the space of possible 
solutions guided by an objective function that is minimal at the desired solution, but can give 
poor directions outside of the neighborhood around its global minimum. Let us consider 
the problem where we would like to find the direction v E M 2 on which several sample 
covariance matrices Si, ... , S m C IR 2x2 are equal. The usual ansatz would be to formulate 
an optimization problem such as 



argmm 
IMI=i 



l<i,j<m 



u T S,-f 



v T T,jV 



(1) 



This objective function measures the deviation from equality for all pairs of covariance 
matrices; it is zero if and only if all projected covariances are equal and positive otherwise. 
Figure [I] shows an example with three covariance matrices (left panel) and the value of 
the objective function for all possible projections v = [cos(a) sin(a)] T . The solution to 
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Covariance matrices 



Objective function over all possible projections 












True solution (a=50) 

— Local minimum (a=154) 
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Figure 1: Illustration of the optimization approach. The left panel shows the contour plots 
of three sample covariance matrices. The black line is the true one-dimensional 
subspace on which the projected variances are exactly equal, the magenta line 
corresponds to a local minimum of the objective function. The right panel shows 
the value of the objective function over all possible one-dimensional subspaces, 
parameterized by the angle a to the horizontal axis; the angles corresponding to 
the global minimum and the local minimum are indicated by black and magenta 
lines respectively. 



this non-convex optimization problem can be found using a gradient-based search procedure, 
which may terminate in one of the local minima (e.g. the magenta line in Figure[T]) depending 
on the initialization. 

However, the natural representation of this problem is not in terms of an objective 
function, but rather a system of equations to be solved for v, namely 

v T Hiv = ■ ■ ■ = v T T, m v. (2) 

In fact, by going from an algebraic description of the set of solutions to a formulation as an 
optimization problem in Equation [TJ we lose important structure. In the case where there is 
an exact solution, it can be attained explicitly with algebraic manipulations. However, when 
we estimate a covariance matrix from finite or noisy samples, there exists no exact solution 
in general. Therefore we present an algorithm which combines the statistical treatment of 
uncertainty in the coefficients of polynomials with the exactness of algebraic computations 
to obtain a consistent estimator for v that is computationally efficient. 

Note that this approach is not limited to this particular learning task. In fact, it is 
applicable whenever a set of solutions can be described in terms of a set of polynomial 
equations, which is a rather general setting. For example, we could use a similar strategy to 
find a subspace on which the projected probability distribution has another property that 
can be described in terms of cumulants, e.g. independence between variables. Moreover, 
an algebraic approach may also be useful in solving certain optimization problems, as the 
set of extrema of a polynomial objective function can be described by the vanishing set 
of its gradient. The algebraic viewpoint also allows a novel interpretation of algorithms 
operating in the feature space associated with the polynomial kernel. We would therefore 
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argue that methods from computational algebra and algebraic geometry are useful for the 
wider machine learning community. 



Projection of covariance matrices 




Quadratic polynomial 
V T Y>iV = 1i T E2l> 

w T (£i - E 2 )w = 



an «i2 

0-21 &22 
2 



v = 



a n X z + (a 12 + a 21 )XY 
+ a 22 ^ 2 = 



Polynomial in coefficient space 

XY 




<fi2 = (aii,ai 2 + 021,022) 



Figure 2: Representation of the problem: the left panel shows sample covariance matrices 
Si and £2 with the desired projection v. In the middle panel, this projection is 
defined as the solution to a quadratic polynomial. This polynomial is embedded 
in the vector space of coefficients spanned by the monomials X 2 ,Y 2 and XY 
shown in the right panel. 



Let us first of all explain the representation over which we compute. We will proceed in 
the three steps illustrated in Figure [2j from the geometric interpretation of sample covariance 
matrices in data space (left panel), to the quadratic equation defining the projection v 
(middle panel) , to the representation of the quadratic equation as a coefficient vector (right 
panel). To start with, we consider the Equation [2] as a set of homogeneous quadratic 
equations defined by 



"(^ 



VI < i,j < m, 



(3) 



where we interpret the components of v as variables, v = [X Y] . The solution to these 
equations is the direction in R 2 on which the projected variance is equal over all covariance 
matrices. Each of these equations corresponds to a quadratic polynomial in the variables 
X and Y, 



^ T (S 
„.T 



an au 

0-21 «22 

2 



a n X 2 + (012 + a 21 )XY + a 22 Y 2 



(4) 



which we embed into the vector space of coefficients. The coordinate axis are the monomials 



{X 2 ,XY,Y 2 }, i.e. the three independent entries in the Gram matrix (Ej 
the polynomial in Equation [4] becomes the coefficient vector 



Ej). That is, 



[an au + a 2 i a 22 ] 



4 



Algebraic Geometric Comparison of Probability Distributions 



The motivation for the vector space interpretation is that every linear combination of the 
Equations [3] is also a characterization of the set of solutions: this will allow us to find a 
particular set of equations by linear combination, from which we can directly obtain the 
solution. Note, however, that the vector space representation does not give us all equations 
which can be used to describe the solution: we can also multiply with arbitrary polynomials. 
However, for the algorithm that we present here, linear combinations of polynomials are 
sufficient. 



Step 1: Polynomials in coefficient space 
kXY 



— ► 

Y 2 



X 2 



Step 2: Approximate linear span 
\.XY 




Step 3: Intersection with {XY, Y 2 } 
tXY 

Y(aX + j3Y) 




Figure 3: Illustration of the algebraic algorithm. The left panel shows the vector space of 
coefficients where the polynomials corresponding to the Equations [3] are consid- 
ered as elements of the vector space shown as red points. The middle panel shows 
the approximate 2-dimensional subspace (blue surface) onto which we project the 
polynomials. The right panel shows the one-dimensional intersection (orange line) 
of the approximate subspace with the plane spanned by spanned by {XY,Y 2 }. 
This subspace is spanned by the polynomial Y(otX + /3Y), so we can divide by 
the variable Y. 



Figure [3] illustrates how the algebraic algorithm works in the vector space of coefficients. 
The polynomials Q = {</y}" J= i span a space of constraints which defines the set of solu- 
tions. The next step is to find a polynomial of a certain form that immediately reveals 
the solution. One of these sets is the linear subspace spanned by the monomials {XY, Y 2 }: 
any polynomial in this span is divisible by Y. Our goal is now to find a polynomial which 
is contained in both this subspace and the span of Q. Under mild assumptions, one can 
always find a polynomial of this form, and it corresponds to an equation 

Y(aX + pY) = 0. (5) 

Since this polynomial is in the span of Q, our solution v has to be a zero of this particular 
polynomial: vi{av\ + j3v2) = 0. Moreover, we can assum^] that vi / 0, so that we can 
divide out the variable Y to get the linear factor (aX + /3Y), 



= aX + PY = [a 0\ 



v. 



1. This is a consequence of the generative model for the observed polynomials which is introduced in 



Section 2.1 In essence, we use the fact that our polynomials have no special property (apart from the 



existence of a solution) with probability one. 
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Hence v = a\ is the solution up to arbitrary scaling, which corresponds to the one- 

dimensional subspace in Figure [3] (orange line, right panel). A more detailed treatment of 
this example can also be found in Appendix \A\ 

In the case where there exists a direction v on which the projected covariances are ex- 
actly equal, the linear subspace spanned by the set of polynomials Q has dimension two, 
which corresponds to the degrees of freedom of possible covariance matrices that have fixed 
projection on one direction. However, since in practice covariance matrices are estimated 
from finite and noisy samples, the polynomials Q usually span the whole space, which means 
that there exists only a trivial solution v = 0. This is the case for the polynomials pictured 
in the left panel of Figure [3j Thus, in order to obtain an approximate solution, we first de- 
termine the approximate two-dimensional span of Q using a standard least squares method 
as illustrated in the middle panel. We can then find the intersection of the approximate 
two-dimensional span of Q with the plane spanned by the monomials {XY, Y 2 }. As we have 
seen in Equation [5j the polynomials in this span provide us with a unique solution for v up 
to scaling, corresponding to the fact that the intersection has dimension one (see the right 
panel of Figure [3]). Alternatively, we could have found the one-dimensional intersection with 
the span of X 2 } and divided out the variable X. In fact, in the final algorithm we will 
find all such intersections and combine the solutions in order to increase the accuracy. Note 
that we have found this solution by solving a simple least-squares problem (second step, 
middle panel of Figure [3]). In contrast, the optimization approach (Figure [T]) can require a 
large number of iterations and may converge to a local minimum. A more detailed example 
of the algebraic algorithm can be found in Appendix [A} 




Figure 4: The left panel shows two sample covariance matrices in the plane, along with a 
direction on which they are equal. In the right panel, a third (green) covariance 
matrix does not have the same projected variance on the black direction. 

The algebraic framework does not only allow us to construct efficient algorithms for 
working with probability distributions, it also offers powerful tools to prove properties of 
algorithms that operate with cumulants. For example, we can answer the following central 
question: how many distinct data sets do we need such that the subspace with identical 
probability distributions becomes uniquely identifiable? This depends on the number of 
dimensions and the cumulants that we consider. Figure [4] illustrates the case where we 
are given only the second order moment in two dimensions. Unless Si — £2 is indefinite, 
there always exists a direction on which two covariance matrices in two dimensions are 
equal (left panel of Figure |4| — irrespective of whether the probability distributions are 
actually equal. We therefore need at least three covariance matrices (see right panel), or to 
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consider other cumulants as well. We derive a tight criterion on the necessary number of 
data sets depending on the dimension and the cumulants under consideration. The proof 
hinges on viewing the cumulants as polynomials in the algebraic geometry framework: the 
polynomials that define the sought-after projection (e.g. Equations [3]) generate an ideal 
in the polynomial ring which corresponds to an algebraic set that contains all possible 
solutions. We can then show how many independent polynomials are necessary so that the 
dimension of the linear part of the algebraic set has smaller dimension in the generic case. 
We conjecture that these proof techniques are also applicable to other scenarios where we 
aim to identify a property of a probability distribution from its cumulants using algebraic 
methods. 

Our work is not the first that applies geometric or algebraic methods to Machine Learn- 
ing or statistics: for example, methods from group theory have already found their applica- 
tion in machine learning, e.g. Kondor (2007); Kondor and Borgwardt (2008); there are also 



algebraic methods estimating structured manifold models for data points as in Vidal et al. 
(2005) which are strongly related to polynomial kernel PCA — a method which can itself 



be interpreted as a way of finding an approximate vanishing set. 

The field of Information Geometry interprets parameter spaces of probability distribu- 
tions as differentiable manifolds and studies them from an information-theoretical point of 
view (see for example the standard book by Amari and Nagaoka ( 2000| )), with recent inter- 
pretations and improvements stemming from the field of algebraic geometry by Watanabe 



(2009). There is also the nascent field of algebraic statistics which studies the parameter 



spaces of mainly discrete random variables in terms of commutative algebra and algebraic 



geometry, see the recent overviews by (Sturmfels 2002 chapter 8) and Drton et al. ( 2010[ ) or 



the book by Gibilisco et al. (2010) which also focuses on the interplay between information 
geometry and algebraic statistics. These approaches have in common that the algebraic 
and geometric concepts arise naturally when considering distributions in parameter space. 

Given samples from a probability distribution, we may also consider algebraic structures 
in the data space. Since the data are uncertain, the algebraic objects will also come with 
an inherent uncertainty, unlike the exact manifolds in the case when we have an a-priori 
family of probability distributions. Coping with uncertainties is one of the main interests 
of the emerging fields of approximative and numerical commutative algebra, see the book 



by Stetter (2004) for an overview on numerical methods in algebra, and (Kreuzer et al 



2009) for recent developments in approximate techniques on noisy data. There exists a 



wide range of methods; however, to our knowledge, the link between approximate algebra 
and the representation of probability distributions in terms of their cumulants has not been 
studied yet. 

The remainder of this paper is organized as follows: in the next Section [2j we intro- 
duce the algebraic view of probability distribution, rephrase our problem in terms of this 
framework and investigate its identifiability. The algorithm for the exact case is presented 
in Section [3j followed by the approximate version in Section [4j The results of our numeri- 
cal simulations and a comparison against Stationary Subspace Analysis (SSA) (von Biinau 



et al. , 2009 ) can be found in Section [5j In the last Section [6j we discuss our findings and 
point to future directions. The appendix contains an example and proof details 
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2. The Algebraic View on Probability Distributions 

In this section we introduce the algebraic framework for dealing with probability distribu- 
tions. This requires basic concepts from complex algebraic geometry. A comprehensive in- 



troduction to algebraic geometry with a view to computation can be found in the book (Cox 



et al. 2007). In particular, we recommend to go through the Chapters 1 and 4. 

In this section, we demonstrate the algebraic viewpoint of probability distributions on 
the application that we study in this paper: finding the linear subspace on which probability 
distributions are equal. 

Problem 2.1 Let X\, . . . ,X m be a set of D-variate random variables, having smooth den- 
sities. Find all linear maps P G TH dxD such that the transformed random variables have the 
same distribution, 

PX 1 PX m . 

In the first part of this section, we show how this problem can be formulated algebraically. 
We will first of all review the relationship between the probability density function and its 
cumulants, before we translate the cumulants into algebraic objects. Then we introduce the 
theoretical underpinnings for the statistical treatment of polynomials arising from estimated 
cumulants and prove conditions on identifiability for the problem addressed in this paper. 



2.1 From Probability Distributions to Polynomials 

The probability distribution of every smooth real random variable X can be fully character- 
ized in terms of its cumulants, which are the tensor coefficients of the cumulant generating 
function. This representation has the advantage that each cumulant provides a compact 
description of certain aspects of the probability density function. 

Definition 2.2 Let X be a D-variate random variable. Then by K n (X) E IR- D(xn ' 1 we denote 
the n-th cumulant, which is a real tensor of degree n. 

Let us introduce a useful shorthand notation for linearly transforming tensors. 

Definition 2.3 Let A G C dxD be a matrix. For a tensor T G R D x ™ (i.e. a real tensor T 
of degree n of dimension D n = D • D • . . . ■ D) we will denote by AoT the application of A 
to T along all tensor dimensions, i.e. 

D D 

(A o T) ix _ An = y~] ■ ■ ■ y] A il j 1 • . . . • Ai n j n Tj x _j n . 
ii=i i»=i 

The cumulants of a linearly transformed random variable are the multilinearly transformed 
cumulants, which is a convenient property when one is looking for a certain linear subspace. 



Proposition 2.4 Let X be a real D -dimensional random variable and let A G M, dxD be a 
matrix. Then the cumulants of the transformed random variable AX are the transformed 
cumulants, 

K n (AX) = AoK n (X). 
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We now want to formulate our problem in terms of cumulants. First of all, note that 
PXi ~ PXj if and only if vXi ~ vXj for all row vectors v G span P T . 

Problem 2.5 Find all d- dimensional linear subspaces in the set of vectors 



S = {veR D v ] Xi v 1 X m } 



T 



= {t> G R v 1 o K n (Xi) = v 1 o k„(X,-), neK,l<!,j<m}. 

Note that we are looking for linear subspaces in S, but that S itself is not a vector space in 
general. Apart from the fact that S is homogeneous, i.e. XS = S for all A 6 fi, there is no 
additional structure that we utilize. 

For the sake of clarity, in the remainder of this paper we restrict ourselves to the first two 
cumulants. Note, however, that one of the strengths of the algebraic framework is that the 
generalization to arbitrary degree is straightforward; throughout this paper, we indicate the 
necessary changes and differences. Thus, from now on, we denote the first two cumulants 
by m = Ki(Xi) and Ej = ^(Aj) respectively for all 1 < i < m. Moreover, without loss of 
generality, we can shift the mean vectors and choose a basis such that the random variable 
X rn has zero mean and unit covariance. Thus we arrive at the following formulation. 

Problem 2.6 Find all d- dimensional linear subspaces in 

S = {v G K D | w T (Si - I)v = 0, v T m = 0, 1 < i < (m - 1)}. 

Note that S is the set of solutions to m — 1 quadratic and m — 1 linear equations in D 
variables. Now it is only a formal step to arrive in the framework of algebraic geometry: let 
us think of the left hand side of each of the quadratic and linear equations as polynomials 
qi, . . . , q m -i and f±, . . . , f m -i in the variables T±, . . . , Tp respectively, 

qi = [Ti ■ ■ ■ T D ] o (Sj - /) and fa = [T ± ■ ■ ■ T D ] o fj, u 

which are elements of the polynomial ring over the complex numbers in D variables, 
C[Ti, . . . ,Td]. Note that in the introduction we have used X and Y to denote the vari- 
ables in the polynomials, we will now switch to Ti, . . . , Tp in order to avoid confusion with 
random variables. Thus S can be rewritten in terms of polynomials, 

S = {v £~R D | qi(v) = fi(v) = 0Vl<i<m-l}, 

which means that S is an algebraic set. In the following, we will consider the corresponding 
complex vanishing set 

S = V(<?i, . . . , q m -i,fi, • • • , fm-i) 

:= {v G C D | qi (v) = fi(v) = 0Vl<Km-l}CC D 

and keep in mind that eventually we will be interested in the real part of S. Working over 
the complex numbers simplifies the theory and creates no algorithmic difficulties: when 
we start with real cumulant polynomials, the solution will always be real. Finally, we can 
translate our problem into the language of algebraic geometry. 
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Problem 2.7 Find all d- dimensional linear subspaces in the algebraic set 

S = V(qi, . . . , q m -x,fx, . . . , fm-i). 

So far, this problem formulation does not include the assumption that a solution exists. In 
order to prove properties about the problem and algorithms for solving it we need to assume 
that there exist a e?-dimensional linear subspace S' C S. That is, we need to formulate a 
generative model for our observed polynomials q%,..., q m -i, fx, . . . , / m -l- To that end, we 
introduce the concept of a generic polynomial, for a technical definition see Appendix B. 
Intuitively, a generic polynomial is a continuous, polynomial valued random variable which 
almost surely has no algebraic properties except for those that are logically implied by the 
conditions on it. An algebraic property is an event in the probability space of polynomials 
which is defined by the common vanishing of a set of polynomial equations in the coefficients. 
For example, the property that a quadratic polynomial is a square of linear polynomial is 
an algebraic property, since it is described by the vanishing of the discriminants. In the 



context of Problem 2.7 we will consider the observed polynomials as generic conditioned 
on the algebraic property that they vanish on a fixed d-dimensional linear subspace S'. 

One way to obtain generic polynomials is to replace coefficients with e.g. Gaussian 
random variables. For example, a generic homogeneous quadric q £ C[Ti,T2] is given by 

q = Z\\Tl + Z12T1T2 + Z22T2, 

where the coefficients Zij ~ A/"(/%, 0«) are independent Gaussian random variables with 
arbitrary parameters. Apart from being homogeneous, there is no condition on q. If we 
want to add the condition that q vanishes on the linear space defined by T\ = 0, we would 
instead consider 

q = ZuTf + Zi 2 TiT 2 . 

A more detailed treatment of the concept of genericity, how it is linked to probabilistic 
sampling, and a comparison with the classical definitions of genericity can be found in 



Appendix B.l 



We are now ready to reformulate the genericity conditions on the random variables 
Xx,. . . ,X m in the above framework. Namely, we have assumed that the Aj are general 
under the condition that they agree in the first two cumulants when projected onto some 



linear subspace S' . Rephrased for the cumulants, Problems 2A_ and 2/7 become well-posed 
and can be formulated as follows. 

Problem 2.8 Let S' be an unknown d-dimensional linear subspace in <C D . Assume that 
fx, ... , f m -x o,re generic homogenous linear polynomials, and qx, ■ ■ . , q m -i are generic ho- 
mogenous quadratic polynomials, all vanishing on S'. Find all d-dimensional linear sub- 
spaces in the algebraic set 

S = V(qx, • • • , q m -x, fx, ■ • • , fm-x)- 

As we have defined "generic" as an implicit "almost sure" statement, we are in fact looking 
for an algorithm which gives the correct answer with probability one under our model 
assumptions. Intuitively, S' should be also the only (i-dimensional linear subspace in S, 
which is not immediately guaranteed from the problem description. Indeed this is true if 
m is large enough, which is the topic of the next section. 
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2.2 Identifiability 



In the last subsection, we have seen how to reformulate our initial Problem 2.1 about 



comparison of cumulants as the completely algebraic Problem 2.8. We can also reformulate 



identifiability of the true solution in the original problem in an algebraic way: identifiability 



in Problem 2.1 means that the projection P can be uniquely computed from the probability 



distributions. Following the same reasoning we used to arrive at the algebraic formulation 



in Problem 2.8, one concludes that identifiability is equivalent to the fact that there exists 
a unique linear subspace in S. 

Since identifiability is now a completely algebraic statement, it can be treated also in 
algebraic terms. In Appendix [Bj we give an algebraic geometric criterion for identifiability 
of the stationary subspace; we will sketch its derivation in the following. 

The main ingredient is the fact that, intuitively spoken, every generic polynomials carries 
one degree of freedom in terms of dimension, as for example the following result on generic 
vector spaces shows: 

Proposition 2.9 Let V be an algebraic 'property such that the polynomials with property V 
form a vector space V. Let fi, . . . , f n £ C[Ti, . . . Tp] be generic polynomials satisfying V. 
Then 

rank span (/i, . . . , /„) = min(n, dimV). 



Proof This is Proposition B.14 in the appendix 



On the other hand, if the polynomials act as constraints, one can prove that each one 
reduces the degrees of freedom in the solution by one: 

Proposition 2.10 Let Z be a sub-vector space ofC D . Let fi, ■ ■ ■ , f n be generic homogenous 
polynomials in D variables (of fixed, but arbitrary degree each), vanishing on Z. Then for 
their common vanishing set V(/i, . . . , f n ) = {x 6 | fi(x) = Vz}, one can write 

v(h,...j n ) = zuu, 

where U is an algebraic set with 

dimU < max(D — n, 0). 
Proof This follows from Corollary B.32| in the appendix. ■ 



Proposition 2.10 can now be directly applied to Problem 2.8 It implies that S = S' if 
2(m — 1) > D + 1, and that S' is the maximal dimensional component of S if 2(m — 1) > 
D — d + 1. That is, if we start with m random variables, then S' can be identified uniquely 
if 

2(m-l) >D-d+l 
with classical algorithms from computational algebraic geometry in the noiseless case. 

Theorem 2.11 Let X\, . . . ,X m be random variables. Assume there exists a projection 
P € M, dxD such that the first two cumulants of all PX\, . . . , PX m agree and the cumulants 
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are generic under those conditions. Then the projection P is identifiable from the first two 
cumulants alone if 

D-d+1 

m > h 1. 

2 

Proof This is a direct consequence of Proposition |B.36| in the appendix, applied to the 



reformulation given in Problem 2.8 It is obtained by applying Proposition 2.10 to the 
generic forms vanishing on the fixed linear subspace S' , and using that S' can be identified 
in S if it is the biggest dimensional part. ■ 

We have seen that identifiability means that there is an algorithm to compute P uniquely 
when the cumulants are known, resp. to compute a unique S from the polynomials fi, qi. It 
is not difficult to see that an algorithm doing this can be made into a consistent estimator 
when the cumulants are sample estimates. We will give an algorithm of this type in the 
following parts of the paper. 

3. An Algorithm for the Exact Case 



In this section we present an algorithm for solving Problem 2.8, under the assumption 



that the cumulants are known exactly. We will first fix notation and introduce important 



algebraic concepts. In the previous section, we derived in Problem 2.8 an algebraic formu- 
lation of our task: given generic quadratic polynomials qi, . . . , q m -\ and linear polynomials 
fi, . . . ,f m -i, vanishing on a unknown linear subspace S' of G D , find S' as the unique d- 
dimensional linear subspace in the algebraic set V(gi, . . . , q m -i, fi, ■ ■ . , fm-l)- First of all, 
note that the linear equations fi can easily be removed from the problem: instead of looking 
at <C D , we can consider the linear subspace defined by the fi, and examine the algebraic set 
y(q[, • ■ • , q' m _i), where q[ are polynomials in D — m + 1 variables which we obtain by substi- 
tuting m — 1 variables. So the problem we need to examine is in fact the modified problem 
where we have only quadratic polynomials. Secondly, we will assume that m — 1 > D. 



Then, from Proposition 2.10 we know that S = S' and Problem 2.8 becomes the following. 



Problem 3.1 Let S be an unknown d- dimensional subspace of C . Given m — 1 > D 
generic homogenous quadratic polynomials qi, . . . , q m —\ vanishing on S, find the d-dimensional 
linear subspace 

S = V(gi, . . .,q m -i). 

Of course, we have to say what we mean by finding the solution. By assumption, the 
quadratic polynomials already fully describe the linear space S. However, since S is a 
linear space, we want a basis for S, consisting of d linearly independent vectors in <C D . 
Or, equivalently, we want to find linearly independent linear forms £\, . . . ,&D—d such that 
£i(x) = for all x £ S. The latter is the correct description of the solution in algebraic terms. 
We now show how to reformulate this in the right language, following the algebra- geometry 
duality. The algebraic set S corresponds to an ideal in the polynomial ring C[T\, . . . , Tjj]. 

Notation 3.2 We denote the polynomial ring C[Ti, . . . ,Td] by R. The ideal of S is an 
ideal in R, and we denote it by by s = I(S). Since S is a linear space, there exists a linear 
generating set l±, . . . ,$D-d °f 5 which we will fix in the following. 
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We can now relate the Problem 3.1 to a classical problem in algebraic geometry. 



Problem 3.3 Let m > D and q±, ■ ■ ■ ,q m -i be generic homogenous quadratic polynomials 
vanishing on a linear d-dimensional subspace S C G D . Then find a linear basis for the 
radical ideal 

V (qi, ■ ■ -,q m -i) = i(V(<?i, . . . ,q m -i)) = l(S). 

The first equality follows from Hilbert's Nullstellensatz. This also shows that solving the 
problem is in fact a question of computing a radical of an ideal. Computing the radical 
of an ideal is a classical problem in computational algebraic geometry, which is known 



to be difficult (for a more detailed discussion see Section 3.3). However, if we assume 
m — 1 > D(D + l)/2 — d(d + l)/2, we can dramatically reduce the computational cost 
and it is straightforward to derive an approximate solution. In this case, the qi generate 
the vector space of homogenous quadratic polynomials which vanish on S, which we will 



denote by 52- That this is indeed the case, follows from Proposition 2.9, and we have 



dimS2 = D(D + l)/2 — d(d + l)/2, as we will calculate in Remark 3.12 

Before we continue with solving the problem, we will need to introduce several concepts 
and abbreviating notations. First we introduce notation to denote sub-vector spaces which 
contain polynomials of certain degrees. 

Notation 3.4 Let I be a sub-C-vector space of R, i.e. X = R, or X is some ideal of R, 
e.g. X = s. We denote the sub-C-vector space of homogenous polynomials of degree k in 
X by Xfc (in commutative algebra, this is standard notation for homogenously generated 
R-modules) . 

For example, the homogenous polynomials of degree 2 vanishing on S form exactly the 
vector space S2- Moreover, for any X, the equation X& = X n Rk holds. The vector spaces 
i?2 and 52 will be the central objects in the following chapters. As we have seen, their 
dimension is given in terms of triangular numbers, for which we introduce some notation: 

Notation 3.5 We will denote the n-th triangular number by A(n) = , 

The last notational ingredient will capture the structure which is imposed on Rk by the 
orthogonal decomposition C D = S © S 1 - . 

Notation 3.6 Let S 1 - be the orthogonal complement of S. Denote its ideal by n = I (»S'" L ). 

Remark 3.7 As n and 5 are homogenously generated in degree one, we have the calculation 
rules 



Sfc+i=Sfc-i?i and tlfc+i = tifc • i?i , 



(si) fc = (s fc ) fc and (m) fc = (n fc ^ 



where ■ is the symmetrized tensor or outer product of vector spaces (these rules are canoni- 
cally induced by the so-called graded structure of R-modules). In terms of ideals, the above 
decomposition translates to 

Ri =s\ ©tii. 
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Using the above rules and the binomial formula for ideals, this induces an orthogonal de- 
composition 

R 2 =Ri ■ Ri = (5i ni) • (si ni) = (si) 2 © (si • m) © (ni) 2 
= si • (si © m) © (n 2 ) 2 = Si • Ri © (n 2 ) 2 = s 2 © (n 2 ) 2 

(and similar decompositions for the higher degree polynomials Rk)- 

The tensor products above can be directly translated to products of ideals, as the vector 
spaces above are each generated in a single degree (e.g. s k ,n k , are generated homogenously 
in degree k). To express this, we will define an ideal which corresponds to R\: 

Notation 3.8 We denote the ideal of R generated by all monomials of degree 1 by m = 
(T X ,...,T D ). 

Note that ideal m is generated by all elements in R±. Moreover, we have rtifc = Rk for all 
k > 1. Using m, one can directly translate products of vector spaces involving some Rk into 
products of ideals: 

Remark 3.9 The equality of vector spaces 

5jfc = 5i • (Ri) k 1 

translates to the equality of ideals 

st~)m k = s-m fc ~\ 

since both the left and right sides are homogenously generated in degree k. 



3.1 The Algorithm 



S c € D 

R = €[T 1 ,...T D ] 
Rk 

A( n ) = ^±i) 
s = {h,...J D ~d) 
s k = Rk n s 
n = l{S ± ) 
n k = Rk n n 
m=(Ti,...,r D ) 



l(S) 



(i- dimensional projection space 

Polynomial ring over C in D variables 

C- vector space of homogenous A;- forms in T\ , . . . , Td 

n-th triangular number 

The ideal of S, generated by linear polynomials t{ 
C-vector space of homogenous A;-forms vanishing on S 
The ideal of S 1 - 

C-vector space of homogenous A;-forms vanishing on S ,J 
The ideal of the origin in C D 



Table 1: Notation and important definitions 



In this section we present an algorithm for solving Problem 3.3 the computation of the 
radical of the ideal (gi, . . . , q m -i) under the assumption that 



m > A(D) - A(d) + 1. 
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Under those conditions, as we will prove in Remark 3.12 (iii), we have that 

(gi, . . .,q m -i) = s 2 . 

Using the notations previously defined, one can therefore infer that solving Problem 3.3 is 
equivalent to computing the radical s = y/$ ■ m in the sense of obtaining a linear generating 
set for s, or equivalent to finding a basis for Si when s 2 is given in an arbitrary basis. s 2 
contains the complete information given by the covariance matrices and Si gives an explicit 
linear description of the space of projections under which the random variables X\, . . . , X m 
agree. 



Algorithm 1 The input consists of the quadratic forms q±, . . . ,q m -i G R, generating s 2 , 
and the dimension d; the output is the linear generating set £±, ■ ■ ■ ,0-D-d f° r s i- 
1: Let 7r ^— (1 2 • • • D) be a transitive permutation of the variable indices {1, . . . , D} 
2: Let Q <— [qi ••• qw-i] be the ((m — 1) x A(D))-matrix of coefficient vectors, where 

every row corresponds to a polynomial and every column to a monomial T{Tj. 
3: for k = 1, . . . ,D — d do 

4: Order the columns of Q according to the lexicographical ordering of monomials TjTj 
with variable indices permuted by 7r fc , i.e. the ordering of the columns is given by 
the relation y as 

T l k (l) ^ T TT k (l) T TT k (2) >~ T TT k (l) T TT k (3) >~ " " " >~ T TT k (1) T TT k (D) >~ T l k (2) 
>~ T 7T k (2) T TT k (3) >-■■■>- ^fe(D-l) >~ T 7T k (D-l) T TT k (D) ^ T ^ k (D) 

5: Transform Q into upper triangular form Q' using Gaussian elimination 

6: The last non-zero row of Q' is a polynomial T^kijy\i, where I is a linear form in s, 

and we set tk <— t 
7: end for 



Algorithm [T] shows the procedure in pseudo-code; a summary of the notation defined in 
the previous section can be found in Table [T] The algorithm has polynomial complexity in 
the dimension d of the linear subspace S. 

Remark 3.10 Algorithm^ has average and worst case complexity 

0((A(D)-A(d)) 2 A(D)), 

In particular, if d is not considered as parameter of the algorithm, the average and the 
worst case complexity is 0(D 6 ). On the other hand, if A(D) — A(d) is considered a fixed 
parameter, then Algorithm 1 has average and worst case complexity 0(D 2 ). 

Proof This follows from the complexities of the elementary operations: upper triangu- 
larization of a generic matrix of rank r with m columns matrix needs 0(r 2 m) operations. 
We first perform triangularization of a rank A(D) — A(d) matrix with A(D) columns. The 
permutations can be obtained efficiently by bringing Q in row-echelon form and then per- 
forming row operations. Operations for extracting the linear forms and comparisons with 
respect to the monomial ordering are negligible. Thus the overall operation complexity to 
calculate s\ is 0((A{D) - A(d)) 2 A(D)). 



15 



KlRALY, VON BUNAU, MEINECKE, BLYTHE AND MULLER 



Note that the difference between worst- and average case lies at most in the coeffi- 
cients, since the inputs are generic and the complexity only depends on the parameter D 
and not on the qi. Thus, with probability 1, exactly the worst-case-complexity is attained. ■ 

There are two crucial facts which need to be verified for correctness of this algorithm. 
Namely, there are implicit claims made in Line [6] of Algorithm [T] First, it is claimed that 
the last non-zero row of Q' corresponds to a polynomial which factors into certain linear 
forms. Second, it is claimed that the I obtained in step [6] generate s resp. S\. The proofs of 
these non-trivial claims can be found in Proposition |3.11| in the next subsection. 

Dealing with additional linear forms /i, . . . , f m -i, is possible by way of a slight modifi- 
cation of the algorithm. Because the fi are linear forms, they are generators of s. We may 
assume that the fi are linearly independent. By performing Gaussian elimination before 
the execution of Algorithm [TJ we may reduce the number of variables by m — 1, thus having 
to deal with new quadratic forms in D — m + 1 instead of D variables. Also, the dimension 
of the space of projections is reduced to min(<i — m+ 1, —1). Setting D' = D — m + 1 and 
d! = min(d — m + 1, —1) and considering the quadratic forms q% with Gaussian eliminated 
variables, Algorithm [T] can be applied to the quadratic forms to find the remaining genera- 
tors for 5\. In particular, if m — 1 > d, then there is no need for considering the quadratic 
forms, since d linearly independent linear forms already suffice to determine the solution. 

We can also incorporate forms of higher degree corresponding to higher order cumulants. 
For this, we start with where k is the degree of the homogenous polynomials we get from 
the cumulant tensors of higher degree. Supposing we start with enough cumulants, we may 
assume that we have a basis of s^. Performing Gaussian elimination on this basis with 
respect to the lexicographical order, we obtain in the last row a form of type T^~^£, where 
£ is a linear form. Doing this for D — d permutations again yields a basis for si. 

Moreover, slight algebraic modifications of this strategy also allow to consider data 
from cumulants of different degree simultaneously, and to reduce the number of needed 
polynomials to 0(D); however, due to its technicality, this is beyond the scope of the 
paper. We sketch the idea: In the general case, one starts with an ideal 

Z = (/i> • • • ! fm), 

homogenously generated in arbitrary degrees, such that \/X = s. Proposition in the ap- 
pendix implies that this happens whenever m > D + 1. One then proves that due to the 
genericity of the fi, there exists an N such that 

2jV = SJV, 

which means that Si can again be obtained by calculating the saturation of the ideal X. 
When fixing the degrees of the fi, we will have N = 0(D) with a relatively small constant 
(for all fi quadratic, this even becomes iV = 0(y/~D)). So algorithmically, one would first 
calculate In = 5n, which then may be used to compute Si and thus s analogously to the 
case N = 2, as described above. 
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3.2 Proof of correctness 

In order to prove the correctness of Algorithm [TJ we need to prove the following three 
statements. 

Proposition 3.11 For Algorithm^it holds that 

(i) Q is of rank A(D) - A(d). 

(ii) The last column of Q in step 6 is of the claimed form. 
(hi) The £±, . . . ,£n-d generate S\. 

Proof This proposition will be proved successively in the following: (i) will follow from 
Remark |3.12 (hi); (ii) will be proved in Lemma 3.13 and (hi) will be proved in Proposi- 



tion EH ■ 

Let us hrst of all make some observations about the structure of the vector space s 2 in which 
we compute. It is the vector space of polynomials of homogenous degree 2 vanishing on S. 
On the other hand, we are looking for a basis £±, . . . , io-d of Si- The following remark will 
relate both vector spaces: 

Remark 3.12 The following statements hold: 

(i) s 2 is generated by the polynomials liTj, l<i<D — d,l<j<D,. 

(ii) dim c s 2 = A(D) - A(d) 

(hi) Let q±, . . . , q m with m > A(D) — A(d) be generic homogenous quadratic polynomials 
in 5. Then (q 1 ,...,q m ) = s 2 . 



Proof (i) In Remark 3.7, we have concluded that s 2 = S\ • R\. Thus the product vector 
space s 2 is generated by a product basis of Si and R\. Since Tj, 1 < j < D is a basis for R\, 



and £i,l < i < D — d is & basis for Si , the statement holds, (ii) In Remark 3.9 , we have seen 
that R2 = s 2 ® (rii) 2 , thus dims 2 = dimi? 2 — dim(ni) 2 . The vector space i? 2 is minimally 
generated by the monomials of degree 2 in T\, . . .Tjj, whose number is A(D). Similarly, 
(rii) 2 is minimally generated by the monomials of degree 2 in the variables ,£' d that 

form the dual basis to the l{. Their number is A(d), so the statement follows, (hi) As the 
qi are homogenous of degree two and vanish on S, they are elements in s 2 . Due to (ii), we 



can apply Proposition 2.9 to conclude that they generate s 2 as vector space. 



Now we continue to prove the remaining claims. 

Lemma 3.13 In Algorithm^ the (A(D) - A(d))-th row of Q' (the upper triangular form 
of Q) corresponds to a 2-form T v ^£ with a linear polynomial £ G s\. 

Proof Note that every homogenous polynomial of degree k is canonically an element of 
the vector space R^ in the monomial basis given by the Tj. Thus it makes sense to speak 
about the coefficients of T{ for an 1-form resp. the coefficients of T{Tj of a 2-form. 

Also, without loss of generality, we can take the trivial permutation it = id, since the 
proof will not depend on the chosen lexicographical ordering and thus will be naturally in- 
variant under permutations of variables. First we remark: since S is a generic d-dimensional 
linear subspace of C D , any linear form in si will have at least d + 1 non- vanishing coeffi- 
cients in the Tj. On the other hand, by displaying the generators £i, 1 < i < D — d in Si 
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in reduced row echelon form with respect to the Tj-basis, one sees that one can choose all 
the t{ in fact with exactly d + 1 non- vanishing coefficients in the Tj such that no nontrivial 
linear combination of the £i has less then d + 1 non-vanishing coefficients. In particular, 
one can choose the li such that the biggest (w.r.t. the lexicographical order) monomial with 
non- vanishing coefficient of l{ is Ti . 



Remark 3.12 (i) states that S2 is generated by 

kTj, 1 < i < D - d, 1 < j < D. 

Together with our above reasoning, this implies the following. 

Fact 1: There exist linear forms £i,l < i < D—d such that: the 2- forms £(Tj generate S2, 
and the biggest monomial of £(Tj with non-vanishing coefficient under the lexicographical 



ordering is TiTj. By Remark 3.12 (ii), the last row of the upper triangular form Q' is 
a polynomial which has zero coefficients for all monomials possibly except the A ((f) + 1 
smallest, 

1-D-dJ-D, -t D-d+li 1 D-d+l + D-d+2, ■ ■ ■ ± D- 

On the other hand, it is guaranteed by our genericity assumption that the biggest of those 
terms is indeed non- vanishing, which implies the following. 

Fact 2: The biggest monomial of the last row with non-vanishing coefficient (w.r.t the 
lexicographical order) is that of Td^^Td- 

Combining Facts 1 and 2, we can now infer that the last row must be a scalar multiple of 
£d-<iXd'- since the last row corresponds to an element of S2, it must be a linear combination 
of the hT-j. By Fact 1, every contribution of an iiTj, 7^ {D — d,D) would add a non- 
vanishing coefficient lexicographically bigger than To-^Td which cannot cancel. So, by Fact 
2, Tjj divides the last row of the upper triangular form of Q, which then must be T^io^d 
or a multiple thereof. Also we have that lo-d G s by definition. ■ 

It remains to be shown that by permutation of the variables we can find a basis for S\. 
Proposition 3.14 The £1, ■ ■ ■ ,io-d generate S\ as vector space and thus s as ideal. 
Proof Recall that tt 1 was the permutation to obtain £{. As we have seen in the proof of 



Lemma 3.13, li is a linear form which has non-zero coefficients only for the d + 1 coefficients 
T n i (£>_d), . . • ,T n i( D y Thus ti has a non-zero coefficient where all the £j,j < i have a zero 
coefficient, and thus £i is linearly independent from the £j,j < i. In particular, it follows 
that the £i are linearly independent in Ri. On the other hand, they are contained in the 
D — (f-dimensional sub-C-vector space Si and are thus a basis of S\, and also a generating 
set for the ideal s. ■ 

Note that all of these proofs generalize to /c-forms. For example, one calculates that 



dim c 5 k 



D + k-l\ /d + k-1 
k J { k 



and the triangularization strategy yields a last row which corresponds to T^,^l with a 
linear polynomial £ € Si 
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3.3 Relation to Previous Work in Computational Algebraic Geometry 

In this section, we discuss how the algebraic formulation of the cumulant comparison prob- 



lem given in Problem 3.3 relates to the classical problems in computational algebraic ge- 
ometry. 

Problem 3.3 confronts us with the following task: given polynomials qi, . . . , q m -i with 



special properties, compute a linear generating set for the radical ideal 

y/(qi,...,q m -i) = I(V(gi, . . . , g m -i)). 

Computing the radical of an ideal is a classical task in computational algebraic geometry, so 
our problem is a special case of radical computation of ideals, which in turn can be viewed 



as an instance of primary decomposition of ideals, see (Cox et al. 2007, 4.7). 



While it has been known for long time that there exist constructive algorithms to cal- 



culate the radical of a given ideal in polynomial rings Hermann (1926), only in the recent 



decades there have been algorithms feasible for implementation in modern computer alge- 



bra systems. The best known algorithms are those of Gianni et al. (1988), implemented in 



AXIOM and REDUCE, the algorithm of Eisenbud et al. (1992), implemented in Macaulay 



2, the algorithm of Caboara et al. (1997), currently implemented in CoCoA, and the al- 



gorithm of Krick and Logar (1991) and its modification by Laplagne (2006), available in 



SINGULAR. 

All of these algorithms have two points in common. First of all, these algorithms have 
computational worst case complexities which are doubly exponential in the square of the 



number of variables of the given polynomial ring, see (Laplagne, 2006, section 4.). Although 



the worst case complexities may not be approached for the problem setting described in the 
current paper, these off-the-shelf algorithms do not take into account the specific properties 
of the ideals in question. 

On the other hand, Algorithm [T] can be seen as a homogenous version of the well-known 
Buchberger algorithm to find a Groebner basis of the dehomogenization of s with respect 
to a degree-first order. Namely, due to our strong assumptions on m, or as is shown in 



Proposition B.27 in the appendix for a more general case, the homogenous saturations of the 
ideal (qi, . . . , q m -i) = ttt-s and the ideal s coincide. In particular, the dehomogenizations of 
the qi constitute a generating set for the dehomogenization of s. The Buchberger algorithm 
now finds a reduced Groebner basis of s which consists of exactly D — d linear polynomials. 
Their homogenizations then constitute a basis of homogenous linear forms of 5 itself. It can 
be checked that the first elimination steps which the Buchberger algorithm performs for 
the dehomogenizations of the qt correspond directly to the elimination steps in Algorithm [l] 
for their homogenous versions. So our algorithm performs similarly to the Buchberger 
algorithm in a noiseless setting, since both algorithms compute a reduced Groebner basis 
in the chosen coordinate system. 

However, in our setting which stems from real data, there is a second point which is more 
grave and makes the use of off-the-shelf algorithms impossible: the computability of an exact 
result completely relies on the assumption that the ideals given as input are exactly known, 
i.e. a generating set of polynomials is exactly known. This is not a problem in classical 
computational algebra; however, when dealing with polynomials obtained from real data, 
the polynomials come not only with numerical error, but in fact with statistical uncertainty. 
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In general, the classical algorithms are unable to find any solution when confronted even 
with minimal noise on the otherwise exact polynomials. Namely, when we deal with a 
system of equations for which over-determination is possible, any perturbed system will 
be over-determined and thus have no solution. For example, the exact intersection of 
N > D + 1 linear subspaces in complex D-space is always empty when they are sampled 
with uncertainty; this is a direct consequence of Proposition 2.10, when using the assumption 
that the noise is generic. However, if all those hyperplanes are nearly the same, then the 
result of a meaningful approximate algorithm should be a hyperplane close to all input 
hyperplanes instead of the empty set. 

Before we continue, we would like to stress a conceptual point in approaching uncer- 
tainty. First, as in classical numerics, one can think of the input as theoretically exact, but 
with fixed error e and then derive bounds on the output error in terms of this e and analyze 
their asymptotics. We will refer to this approach as numerical uncertainty, as opposed to 
statistical uncertainty, which is a view more common to statistics and machine learning, as 
it is more natural for noisy data. Here, the error is considered as inherently probabilistic 
due to small sample effects or noise fluctuation, and algorithms may be analyzed for their 
statistical properties, independent of whether they are themselves deterministic or stochas- 
tic. The statistical view on uncertainty is the one the reader should have in mind when 
reading this paper. 

Parts of the algebra community have been committed to the numerical viewpoint on 
uncertain polynomials: the problem of numerical uncertainty is for example extensively 
addressed in Stetter's standard book on numerical algebra (Stetter, 2004). The main dif- 
ficulties and innovations stem from the fact that standard methods from algebra like the 
application of Groebner bases are numerically unstable, see (Stetter 2004, chapter 4.1-2). 

Recently, the algebraic geometry community has developed an increasing interest in 
solving algebraic problems arising from the consideration of real world data. The algorithms 
in this area are more motivated to perform well on the data, some authors start to adapt 
a statistical viewpoint on uncertainty, while the influence of the numerical view is still 
dominant. As a distinction, the authors describe the field as approximate algebra instead 



of numerical algebra. Recent developments in this sense can be found for example in (Heldt 



et al. , 2009 ) or the book of Kreuzer et al. ( 2009 ) . We will refer to this viewpoint as the 



statistical view in order to avoid confusion with other meanings of approximate. 

Interestingly, there are significant similarities on the methodological side. Namely, in 
computational algebra, algorithms often compute primarily over vector spaces, which arise 
for example as spaces of polynomials with certain properties. Here, numerical linear algebra 
can provide many techniques of enforcing numerical stability, see the pioneering paper of 



Corless et al. (1995). Since then, many algorithms in numerical and approximate algebra 



utilize linear optimization to estimate vector spaces of polynomials. In particular, least- 
squares-approximations of rank or kernel are canonical concepts in both numerical and 
approximate algebra. 

However, to the best of our knowledge, there is to date no algorithm which computes 
an "approximate" (or "numerical") radical of an ideal, or an approximate saturation, and 
also none in our special case. In the next section, we will use estimation techniques from 
linear algebra to convert Algorithm [T] into an algorithm which can cope with the inherent 
statistical uncertainty of the estimation problem. 
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4. Approximate Algebraic Geometry on Real Data 

In this section we show how algebraic computations can be applied to polynomials with 
inexact coefficients obtained from estimated cumulants on finite samples. Note that our 
method for computing the approximate radical is not specific to the problem studied in this 
paper. 

The reason why we cannot directly apply our algorithm for the exact case to estimated 
polynomials is that it relies on the assumption that there exists an exact solution, such that 
the projected cumulants are equal, i.e. we can find a projection P such that the equalities 



hold exactly. However, when the elements of Si, ... , S m and . . . , /x m are subject to 
random fluctuations or noise, there exists no projection that yields exactly the same ran- 
dom variables. In algebraic terms, working with inexact polynomials means that the joint 
vanishing set of q±, ... , q m -i and /i, . . . , f m -i consists only of the origin G C D so that the 
ideal becomes trivial: 



Thus, in order to find a meaningful solution, we need to compute the radical approximately. 

In the exact algorithm, we are looking for a polynomial of the form T&1 vanishing on 
S, which is also a C-linear combination of the quadratic forms c/j. The algorithm is based 
on an explicit way to do so which works since the % are generic and sufficient in number. 
So one could proceed to adapt this algorithm to the approximate case by performing the 
same operations as in the exact case and then taking the (A(D) — A(d))-th row, setting 
coefficients not divisible by Tjj to zero, and then dividing out To to get a linear form. This 
strategy performs fairly well for small dimensions D and converges to the correct solution, 
albeit slowly. 

Instead of computing one particular linear generator as in the exact case, it is advisable 
to utilize as much information as possible in order to obtain better accuracy. The least- 
squares-optimal way to approximate a linear space of known dimension is to use singular 
value decomposition (SVD): with this method, we may directly eliminate the most insignif- 
icant directions in coefficient space which are due to fluctuations in the input. To that end, 
we first define an approximation of an arbitrary matrix by a matrix of fixed rank. 

Definition 4.1 Let A £ C mxn with singular value decomposition A = UDV*, where D = 
diag(<Ti, . . . , Up) G C pxp is a diagonal matrix with ordered singular values on the diagonal, 



For k < p, let D' = diag(ci, . . . , <Jk, 0, . . . , 0). Then the matrix A' = UD'V* is called rank 
k approximation of A. The null space, left null space, row span, column span of A' will be 
called rank k approximate null space, left null space, row span, column span of A. 

For example, if u\, . . . , u p and v\,...,v p are the columns of U and V respectively, the rank 
k approximate left null space of A is spanned by the rows of the matrix 



PSiP 1 = • • • = PS m P 1 and Pfn = ■ ■ ■ = P/i. 



rn 



(Qi, 



Qm-l,fl, ■ ■ ■ , fm-l) — m - 



Ol| > 1 02 1 > " " " > | Op | > 0. 



L = [u p _ fe+ i 
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and the rank k approximate row span of A is spanned by the rows of the matrix 

S = [vi ■■■ v p ] T . 

We will call those matrices the approximate left null space matrix resp. the approximate 
row span matrix of rank k associated to A. The approximate matrices are the optimal 
approximations of rank k with respect to the least-squares error. 

We can now use these concepts to obtain an approximative version of Algorithm [TJ 
Instead of searching for a single element of the form Tjj£, we estimate the vector space of 
all such elements via singular value decomposition — note that this is exactly the vector 
space ((To) -s) 2 , i.e. the vector space of all homogenous polynomials of degree two which 
are divisible by Tjj. Also note that the choice of the linear form Tjj is irrelevant, i.e. we 
may replace above by any variable or even linear form. As a trade-off between accuracy 
and runtime, we additionally estimate the vector spaces ((7d) -s) 2 for sill <i < D, and 
then least-squares average the putative results for s to obtain a final estimator for s and 
thus the desired space of projections. 

Algorithm 2 The input consists of noisy quadratic forms q±, . . . ,q m -i G Cp~i, . . . ,Tjj], 
and the dimension d; the output is an approximate linear generating set t\, . . . , tu-d for the 
ideal 5. 

1: Let Q <— [qi • • • (?m-i] T be the (m — 1 x A(D))-matrix of coefficient vectors, where 
every row corresponds to a polynomial and every column to a monomial T{Fj in arbitrary 
order. 

2: for i = 1, . . . ,D do 

3: Let Qi be the ((m — 1) x A(D) — D)-sub-matrix of Q obtained by removing all 

columns corresponding to monomials divisible by Tj 
4: Compute the approximate left null space matrix Li of Qi of rank (m — 1) — A(D) + 

A(d) + D-d 

5: Compute the approximate row span matrix of LiQ of rank D — d 
6: Let L'l be the (D — d x D)-matrix obtained from L\ by removing all columns corre- 
sponding to monomials not divisible by Tj 
7: end for 

8: Let L be the (D(D — d) x D)-matrix obtained by vertical concatenation of L", . . . , L" D 
9: Compute the approximate row span matrix A = [oi • • • a£)_d] T of L of rank D — d 
and let l\ = \T\ ■ ■ ■ To] for all 1 < i < D — d. 



We explain the logic behind the single steps: In the first step, we start with the same 
matrix Q as in Algorithm 1. Instead of bringing Q into triangular form with respect to the 
term order T\ -< ■ ■ ■ -< Tr> , we compute the left kernel space row matrix Si of the monomials 
not divisible by Tj. Its left image Li = SiQ is a matrix whose row space generates the 
space of possible last rows after bringing Q into triangular form in an arbitrary coordinate 
system. In the next step, we perform PCA to estimate a basis for the so-obtained vector 
space of quadratic forms of type T times linear form, and extract a basis for the vector 
space of linear forms estimated via Li. Now we can put together all Li and again perform 
PCA to obtain a more exact and numerically more estimate for the projection in the last 
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step. The rank of the matrices after PCA is always chosen to match the correct ranks in 
the exact case. 

Note that Algorithm [2] is a consistent estimator for the correct space of projections 
if the covariances are sample estimates. Let us first clarify in which sense consistent is 
meant here: If each covariance matrix is estimated from a sample of size N or greater, 
and N goes to infinity, then the estimate of the projection converges in probability to the 
true projection. The reason why Algorithm [2] gives a consistent estimator in this sense is 
elementary: covariance matrices can be estimated consistently, and so can their differences, 
the polynomials Moreover, the algorithm can be regarded as an almost continuous 
function in the polynomials so convergence in probability to the true projection and 
thus consistency follows from the continuous mapping theorem. 

The runtime complexity of Algorithm [2] is 0(D & ) as for Algorithm[l] For this note that 
calculating the singular value decomposition of an m x n-matrix is 0(mnma.x(m,n)). 

If we want to consider k- forms instead of 2-forms, we can use the same strategies as 
above to numerically stabilize the exact algorithm. In the second step, one might want 
to consider all sub-matrices Qm of Q obtained by removing all columns corresponding to 
monomials divisible by some degree (k — 1) monomial M and perform the for- loop over 
all such monomials or a selection of them. Considering D monomials or more gives again 
a consistent estimator for the projection. Similarly, these methods allow us to numeri- 
cally stabilize versions with reduced epoch requirements and simultaneous consideration of 
different degrees. 



5. Numerical Evaluation 



In this section we evaluate the performance of the algebraic algorithm on synthetic data 
in various settings. In order to contrast the algebraic approach with an optimization- 
based method (cf. Figure [l]), we compare with the Stationary Subspace Analysis (SSA) 



algorithm ( von Biinau et al. , 2009 ) , which solves a similar problem in the context of time 



series analysis. To date, SSA has been successfully applied in the context of biomedical 
data analysis (von Biinau et al. , 2010), domain adaptation (Hara et al. 2010), change-point 



detection ( von Biinau et al. 2009 ) and computer vision ( Meinecke et al. 2009 ) 



5.1 Stationary Subspace Analysis 



Stationary Subspace Analysis (von Biinau et al. , 2009 Miiller et al. , 2011 ) factorizes an ob 



served time series according to a linear model into underlying stationary and non-stationary 
sources. The observed time series x(t) G H D is assumed to be generated as a linear mixture 
of stationary sources s s (t) G R d and non-stationary sources s n (t) G M, D ~ d , 



it) = As(t) = [A s A n ] 



s*(t) 



(6) 



with a time-constant mixing matrix A. The underlying sources s(t) are not assumed to be 
independent or uncorrelated. 

The aim of SSA is to invert this mixing model given only samples from x(t). The true 
mixing matrix A is not identifiable (von Biinau et al. 2009); only the projection P G IR rfxD 
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to the stationary sources can be estimated from the mixed signals x(t), up to arbitrary linear 
transformation of its image. The estimated stationary sources are given by s s (t) = Px(t), 
i.e. the projection P eliminates all non-stationary contributions: PA n = 0. 



The SSA algorithms (von Biinau et al. 2009 Hara et al. , 2010 ) are based on the following 



definition of stationarity: a time series Xt is considered stationary if its mean and covariance 
is constant over time, i.e. EfA"^] = E[Xt 2 ] and EfA^A^] = EfA^X^jJ for all pairs of 
time points t±,t2 G IN. Following this concept of stationarity, the projection P is found by 
minimizing the difference between the first two moments of the estimated stationary sources 
s s {t) across epochs of the times series. To that end, the samples from x(t) are divided into 
m non-overlapping epochs of equal size, corresponding to the index sets 71, . . . ,T m , from 
which the mean and the covariance matrix is estimated for all epochs 1 < i < m, 



1 

m 



teTi 



and 



1 



teTi 



Given a projection P, the mean and the covariance of the estimated stationary sources 
in the i-th epoch are given by /if = Pfn and Ef = PSjP T respectively. Without loss of 
generality (by centering and whitening^] the average epoch) we can assume that s s {t) has 
zero mean and unit covariance. 

2009) minimizes the 



The objective function of the SSA algorithm (von Biinau et al. 



sum of the differences between each epoch and the standard normal distribution, measured 
by the Kullback-Leibler divergence -Dkl between Gaussians: the projection P* is found as 
the solution to the optimization problem, 



P* = argmin V D Kh \N{^ Sf) JV(0, 1) 
PP T =I ^ 



argmin ( — log det Sf + (/if) 
pp t =i 



i=l 



which is non-convex and solved using an iterative gradient-based procedure. 

This SSA algorithm considers a problem that is closely related to the one addressed 
in this paper, because the underlying definition of stationarity does not consider the time 
structure. In essence, the m epochs are modeled as m random variables Xi, . . . ,X m for 
which we want to find a projection P such that the projected probability distributions 
PX\, . . . , PX m are equal, up to the first two moments. This problem statement is equivalent 
to the task that we solve algebraically. 



5.2 Results 

In our simulations, we investigate the influence of the noise level and the number of dimen- 
sions on the performance and the runtime of our algebraic algorithm and the SSA algorithm. 
We measure the performance using the subspace angle between the true and the estimated 
space of projections S. 

2. A whitening transformation is a basis transformation W that sets the sample covariance matrix to the 
identity. It can be obtained from the sample covariance matrix S as W — S" 5 
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The setup of the synthetic data is as follows: we fix the total number of dimensions to 
D = 10 and vary the dimension d of the subspace with equal probability distribution from 
one to nine. We also fix the number of random variables to m = 110. For each trial of the 
simulation, we need to choose a random basis for the two subspaces H D = S © S^, and 
for each random variable, we need to choose a covariance matrix that is identical only on 
S. Moreover, for each random variable, we need to choose a positive definite disturbance 
matrix (with given noise level a), which is added to the covariance matrix to simulate the 
effect of finite or noisy samples. 

The elements of the basis vectors for S and S 1 - are drawn uniformly from the interval 
(—1,1). The covariance matrix of each epoch 1 < i < m is obtained from Cholesky factors 
with random entries drawn uniformly from ( — 1,1), where the first d rows remain fixed 
across epochs. This yields noise-free covariance matrices C\, . . . ,C m G M, DxD where the 
first (d x (i)-block is identical. Now for each d, we generate a random disturbance matrix 
Ei to obtain the final covariance matrix 

C\ = Ci + Ei. 

The disturbance matrix Ei is determined as Ei = ViDiV^ where Vi is a random orthogonal 
matrix, obtained as the matrix exponential of an antisymmetric matrix with random ele- 
ments and Di is a diagonal matrix of eigenvalues. The noise level a is the log-determinant 
of the disturbance matrix Ei. Thus the eigenvalues of Di are normalized such that 

1 10 

— J2 l °sDu = a. 

i=l 

In the final step of the data generation, we transform the disturbed covariance matrices 
C[, . . . , C' m into the random basis to obtain the cumulants Si, ... , E m which are the input 
to our algorithm. 

Noise level: 10" 4 Noise level: 10" 3 Noise level: 1 CT 2 Noise level: 10" 1 
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Figure 5: Comparison of the algebraic algorithm and the SSA algorithm. Each panel shows 
the median error of the two algorithms (vertical axis) for varying numbers of 
stationary sources in ten dimensions (horizontal axis). The noise level increases 
from the left to the right panel; the error bars extend from the 25% to the 75% 
quantile estimated over 2000 random realizations of the data set. 

The first set of results is shown in Figure § With increasing noise levels (from left to 
right panel) both algorithms become worse. For low noise levels, the algebraic method yields 
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significantly better results than the optimization-based approach, over all dimensionalities. 
For medium and high-noise levels, this situation is reversed. 




-5 -4 -3 -2 -1 -0.1 123456789 

Noise level (Iog10) Number ol stationary sources 

Figure 6: The left panel shows a comparison of the algebraic method and the SSA algorithm 
over varying noise levels (five stationary sources in ten dimensions) , the two curves 
show the median log error. The right panel shows a comparison of the runtime 
for varying numbers of stationary sources. The error bars extend from the 25% 
to the 75% quantile estimated over 2000 random realizations of the data set. 

In the left panel of Figure [6j we see that the error level of the algebraic algorithm 
decreases with the noise level, converging to the exact solution when the noise tends to 
zero. In contrast, the error of original SSA decreases with noise level, reaching a minimum 
error baseline which it cannot fall below. In particular, the algebraic method significantly 
outperforms SSA for low noise levels, whereas SSA is better for high noise. However, when 
noise is too high, none of the two algorithms can find the correct solution. In the right 
panel of Figure [6j we see that the algebraic method is significantly faster than SSA. 
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6. Conclusion 

In this paper we have shown how a learning problem formulated in terms of cumulants 
of probability distributions can be addressed in the framework of computational algebraic 
geometry. As an example, we have demonstrated this viewpoint on the problem of finding 
a linear map P G M, dxD such that a set of projected random variables X\, . . . ,X m £ M, D 
have the same distribution, 

PX\ ~ • • • ~ PX m . 

To that end, we have introduced the theoretical groundwork for an algebraic treatment of 
inexact cumulants estimated from data: the concept of polynomials that are generic up to 
a certain property that we aim to recover from data. In particular, we have shown how 
we can find an approximate exact solution to this problem using algebraic manipulation of 
cumulants estimated on samples drawn from X\ , . . . , X m . Therefore we have introduced the 
notion of computing an approximate saturation of an ideal that is optimal in a least-squares 
sense. Moreover, using the algebraic problem formulation in terms of generic polynomials, 
we have presented compact proofs for a condition on the identifiability of the true solution. 

In essence, instead of searching the surface of a non-convex objective function involving 
the cumulants, the algebraic algorithm directly finds the solution by manipulating cumulant 
polynomials — which is the more natural representation of the problem. This viewpoint 
is not only theoretically appealing, but conveys practical advantages that we demonstrate 



in a numerical comparison to Stationary Subspace Analysis (von Biinau et al. 2009): the 
computational cost is significantly lower and the error converges to zero as the noise level 
goes to zero. However, the algebraic algorithm requires m > A(D) random variables with 
distinct distributions, which is quadratic in the number of dimensions D. This is due to 
the fact that the algebraic algorithm represents the cumulant polynomials in the vector 
space of coefficients. Consequently, the algorithm is confined to linearly combining the 
polynomials which describe the solution. However, the set of solutions is also invariant 
under multiplication of polynomials and polynomial division, i.e. the algorithm does not 
utilize all information contained in the polynomial equations. We conjecture that we can 
construct a more efficient algorithm, if we also multiply and divide polynomials. 

The theoretical and algorithmic techniques introduced in this paper can be applied to 
other scenarios in machine learning, including the following examples. 

• Finding properties of probability distributions. Any inference problem that can 
be formulated in terms of polynomials, in principle, amenable our algebraic approach; 
incorporating polynomial constraints is also straightforward. 

• Approximate solutions to polynomial equations. In machine learning, the 
problem of solving polynomial equations can e.g. occur in the context of finding the 
solution to a constrained nonlinear optimization problem by means of setting the 
gradient to zero. 

• Conditions for identifiability. Whenever a machine learning problem can be for- 
mulated in terms of polynomials, identifiability of its generative model can also be 
phrased in terms of algebraic geometry, where a wealth of proof techniques stands at 
disposition. 
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We argue for a cross-fertilization of approximate computational algebra and machine 
learning: the former can benefit from the wealth of techniques for dealing with uncertainty 
and noisy data; the machine learning community may find a novel framework for represent- 
ing learning problems that can be solved efficiently using symbolic manipulation. 
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Appendix A. An example 

In this section, we will show by using a concrete example how the Algorithms [T] and [2] work. 
The setup will be the similar to the example presented in the introduction. We will use the 
notation introduced in Section [3l 

Example A.l In this example, let us consider the simplest non-trivial case: Two random 
variables Xi,X 2 in 1R 2 such that there is exactly one direction w G R 2 such that w T X\ = 
w T X2- I.e. the total number of dimensions is D = 2, the dimension of the set of projections 
is d = 1. As in the beginning of Section [3J we may assume that R 2 = S © S 1 - is an 
orthogonal sum of a one-dimensional space of projections S and its orthogonal complement 
S ± . In particular, S 1 - is given as the linear span of a single vector, say [a /?] . The space 

S is also the linear span of the vector \fi — a] T . 

Now we partition the sample into D(D + l)/2 — d(d + l)/2 = 2 epochs (this is the lower 



bound needed by Proposition 3.11). From the two epochs we can estimate two covariance 



matrices Ei,£2- Suppose we have 



Si 



an ai2 

<221 0,22 



(7) 



From this matrices, we can now obtain a polynomial 
qi = w T (Y,\ — I)w 



w 



an - 1 ai2 
021 a 2 2 - 1 



w 



(an - l)Tf + (012 + a 2 i)TiT 2 + (a 22 - 1)T 2 2 , 



(8) 



where w = \T\ T2] . Similarly, we obtain a polynomial (72 as the Gram polynomial of 
t 2 -I. 

First we now illustrate how Algorithm [TJ which works with homogenous exact polyno- 
mials, can determine the vector space S from these polynomials. For this, we assume that 
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the estimated polynomials are exact; we will discuss the approximate case later. We can 
also write q\ and q 2 in coefficient expansion: 

gi = gulf + qviT\Ti + gi 3 r 2 2 
92 = Q2iT? + 922^1 T 2 + g 2 3^f . 

We can also write this formally in the (2 x 3) coefficient matrix Q = (qij)ij, where the 
polynomials can be reconstructed as the entries in the vector 

Q-[lf TiT 2 T|] T . 

Algorithm [I] now calculates the upper triangular form of this matrix. For polynomials, this 
is equivalent to calculating the last row 

92191 - 91192 

= [921912 - 9n922]7iT 2 + [921913 - 9li923]?f. 
Then we divide out T 2 and obtain 

P = [921912 - 9ll922]?l + [921913 - 9ll923]72. 

The algorithm now identifies S 1 - as the vector space spanned by the vector 

[a /3] T = [921912 - 911922 921913 - 911923] T • 

This already finishes the calculation given by Algorithm [TJ as we now explicitly know the 
solution 

[a /3] T 

To understand why this strategy works, we need to have a look at the input. Namely, one 
has to note that q\ and q 2 are generic homogenous polynomials of degree 2, vanishing on 
S. That is, we will have q%{x) = for i = 1,2 and all points x € S. It is not difficult to see 
that every polynomial fulfilling this condition has to be of the form 

(ar 1 +/3T 2 )(aTi + 6T 2 ) 

for some a, b E C; i.e. a multiple of the equation defining S. However we may not know this 
factorization a priori, in particular we are in general agnostic as to the correct values of a 
and /3. They have to be reconstructed from the qi via an algorithm. Nonetheless, a correct 
solution exists, so we may write 

Ql = (aT 1 +/3T 2 )(a 1 X + 6 1 T 2 ) 
q 2 = {aT 1 +pT 2 ){a 2 X + b 2 T 2 ), 

with di,bi generic, without knowing the exact values a priori. If we now compare to the 
above expansion in the , we obtain the linear system of equations 

qn = aai 

q i2 = abi + f3ai 

9i3 = 
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for i = l,2, from which we may reconstruct the a,i, bi and thus a and /3. However, a more 
elegant and general way of getting to the solution is to bring the matrix Q as above into 
triangular form. Namely, by assumption, the last row of this triangular form corresponds to 
the polynomial P which vanishes on S. Using the same reasoning as above, the polynomial 
P has to be a multiple of {aT\ +/3T2). To check the correctness of the solution, we substitute 
the qij in the expansion of P for cij, 6j, and obtain 

P =[<?2i<?i2 - qnq22}T 1 T 2 + [<?2i<?i3 - qilQ22,]T£ 

= [aa2(abi + (3ai) — aa\{ab2 + (3ci2)]TiT2 + [aa2/36i — otaiflbjfr^ 
= [a 2 a,2bi — a 2 aib2]TiT2 + [af3a2b\ — afia^XT 2 
={aTi + l3T 2 )a[a 2 bi - aib 2 ]T 2 . 

This is (aTi + fiT?) times T2 up to a scalar multiple - from the coefficients of the form P, 
we may thus directly reconstruct the vector [a m up to a common factor and thus obtain 
a representation for S, since the calculation of these coefficients did not depend on a priori 
knowledge about S. 

If the estimation of the and thus of the qt is now endowed with noise, and we have 
more than two epochs and polynomials, Algorithm [2] provides the possibility to perform 
this calculation approximately. Namely, Algorithm [2] finds a linear combination of the qi 
which is approximately of the form Tpi with a linear form £ in the variables T\,T 2 . The 
Young-Eckart Theorem guarantees that we obtain a consistent and least-squares-optimal 
estimator for P, similarly to the exact case. The reader is invited to check this by hand as 
an exercise. 

Now the observant reader may object that we may have simply obtained the linear form 
(aTi + PT2) and thus S directly from factoring q\ and q 2 and taking the unique common 
factor. This is true, but this strategy can only be applied in the very special case D — d= 1. 
To illustrate the additional difficulties in the general case, we repeat the above example for 
D = 4 and d = 2 for the exact case: 

Example A. 2 In this example, we need already D{D + l)/2 — d(d + l)/2 = 7 polynomials 
qi, . . . , qj to solve the problem with Algorithm [TJ As above, we can write 

qi =qnT± + qi 2 T 1 T 2 + q a TiT^ + q^T x T^ + gj 5 T 2 2 

+ qieT 2 T 3 + qttTiT^ + q i8 T 2 + q^T^Ti + qi,ioT$ 



for i = 1, . . . ,7, and again we can write this in a (7 x 10) coefficient matrix Q = (qij)ij. In 
Algorithm [TJ this matrix is brought into triangular form. The last row of this triangular 
matrix will thus correspond to a polynomial of the form 

P = P7T2T4 + p 8 T 2 + P9T3T4 + piqT| 



A polynomial of this form is not divisible by T4 in general. However, Proposition 3.11 
guarantees us that the coefficient ps is always zero due to our assumptions. So we can 
divide out T4 to obtain a linear form 

P7T2 + pgT 3 + P10T4. 
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This is one equation denning the linear space S. One obtains another equation in the 
variables T\, T2, T3 if one, for example, inverts the numbering of the variables 1 — 2 — 3 — 4 
to 4 — 3 — 2 — 1. Two equations suffice to describe S, and so Algorithm [T] yields the correct 
solution. 

As in the example before, it can be checked by hand that the coefficient p-? indeed 
vanishes, and the obtained linear equations define the linear subspace S. For this, one has 
to use the classical result from algebraic geometry that every qi can be written as 

q i =£ 1 P 1 +£ 2 P2, 

where the li are fixed but arbitrary linear forms defining S as their common zero set, and 
the Pi are some linear forms determined by q% and the ii (this is for example a direct 
consequence of Hilbert's Nullstellensatz). Caution is advised as the equations involved 
become very lengthy - while not too complex - already in this simple example. So the 
reader may want to check only that the coefficient p% vanishes as claimed. 



Appendix B. Algebraic Geometry of Genericity 

In the paper, we have reformulated a problem of comparing probability distributions in al- 
gebraic terms. For the problem to be well-defined, we need the concept of genericity for the 
cumulants. The solution can then be determined as an ideal generated by generic homoge- 
nous polynomials vanishing on a linear subspace. In this supplement, we will extensively 
describe this property which we call genericity and derive some simple consequences. 

Since genericity is an algebraic-geometric concept, knowledge about basic algebraic ge- 
ometry will be required for an understanding of this section. In particular, the reader 
should be at least familiar with the following concepts before reading this section: Poly- 
nomial rings, ideals, radicals, factor rings, algebraic sets, algebra-geometry correspondence 
(including Hilbert's Nullstellensatz), primary decomposition, height resp. dimension theory 
in rings. A good introduction into the necessary framework can be found in the book of 



Cox et al. (2007) 



B.l Definition of genericity 

In the algebraic setting of the paper, we would like to calculate the radical of an ideal 

t — (<7i> • • • ! Qm-l,fl, • • • , fm-l)- 

This ideal X is of a special kind: its generators are random, and are only subject to the 
constraints that they vanish on the linear subspace S to which we project, and that they 
are homogenous of fixed degree. In order to derive meaningful results on how X relates to 
S, or on the solvability of the problem, we need to model this kind of randomness. 

In this section, we introduce a concept called genericity. Informally, a generic situation 
is a situation without pathological degeneracies. In our case, it is reasonable to believe that 
apart from the conditions of homogeneity and the vanishing on S, there are no additional 
degeneracies in the choice of the generators. So, informally spoken, the ideal X is generated 
by generic homogenous elements vanishing on S. This section is devoted to developing a 
formal theory in order to address such generic situations efficiently. 
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The concept of genericity is already widely used in theoretical computer science, combi- 
natorics or discrete mathematics; there, it is however often defined inexactly or not at all, 
or it is only given as an ad-hoc definition for the particular problem. On the other hand, 
genericity is a classical concept in algebraic geometry, in particular in the theory of mod- 
uli. The interpretation of generic properties as probability-one-properties is also a known 
concept in applied algebraic geometry, e.g. algebraic statistics. However, the application 
of probability distributions and genericity to the setting of generic ideals, in particular in 
the context of conditional probabilities, are original to the best of our knowledge, though 



not being the first one to involve generic resp. general polynomials, see (Iarrobino, 1984). 



Generic polynomials and ideals have been also studied in (Froberg and Hollman, 1994). A 



collection of results on generic polynomials and ideals which partly overlap with ours may 



also be found in the recent paper (Pardue 2010). 



Before continuing to the definitions, let us explain what genericity should mean. In- 
tuitively, generic objects are objects without unexpected pathologies or degeneracies. For 
example, if one studies say n lines in the real plane, one wants to exclude pathological cases 
where lines lie on each other or where many lines intersect in one point. Having those cases 
excluded means examining the "generic" case, i.e. the case where there are n(n + l)/2 
intersections, n{n + 1) line segments and so forth. Or when one has n points in the plane, 
one wants to exclude the pathological cases where for example there are three affinely de- 
pendent points, or where there are more sophisticated algebraic dependencies between the 
points which one wants to exclude, depending on the problem. 

In the points example, it is straightforward how one can define genericity in terms 
of sampling from a probability distribution: one could draw the points under a suitable 
continuous probability distribution from real two-space. Then, saying that the points are 
"generic" just amounts to examine properties which are true with probability one for the 
n points. Affine dependencies for example would then occur with probability zero and are 
automatically excluded from our interest. One can generalize this idea to the lines example: 
one can parameterize the lines by a parameter space, which in this case is two-dimensional 
(slope and ordinate), and then sample lines uniformly distributed in this space (one has 
of course to make clear what this means). For example, lines lying on each other or more 
than two lines intersecting at a point would occur with probability zero, since the part of 
parameter space for this situation would have measure zero under the given probability 
distribution. 

When we work with polynomials and ideals, the situation gets a bit more complicated, 
but the idea is the same. Polynomials are uniquely determined by their coefficients, so they 
can naturally be considered as objects in the vector space of their coefficients. Similarly, an 
ideal can be specified by giving the coefficients of some set of generators. Let us make this 
more explicit: suppose first we have given a single polynomial / G C[Ai, . . . Xjj] of degree 
k. 

In multi-index notation, we can write this polynomial as a finite sum 

f=j2 CaX ° with ° a G c - 
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This means that the possible choices for / can be parameterized by the ( D ^ k ) coefficients cj 
with ||/||i < k. Thus polynomials of degree k with complex coefficients can be parameterized 
by complex ( D ^ h ) -space. 

Algebraic sets can be similarly parameterized by parameterizing the generators of the 
corresponding ideal. However, this correspondence is highly non-unique, as different gener- 
ators may give rise to the same zero set. While the parameter space can be made unique by 
dividing out redundancies, which gives rise to the Hilbert scheme, we will instead use the 
redundant, though pragmatic characterization in terms of a finite dimensional vector space 
over C of the correct dimension. 

We will now fix notation for the parameter space of polynomials and endow it with 
algebraic structure. The extension to ideals will then be derived later. Let us write Mk for 
complex ( D fr k ) -space (we assume D as fixed), interpreting it as a parameter space for the 
polynomials of degree k as shown above. Since the parameter space Ai k is isomorphic to 
complex { D fr k ) -space, we may speak about algebraic sets in Mk- Also, A4k carries the com- 
plex topology induced by the topology on K, 2fc and by topological isomorphy the Lebesgue 
measure; thus it also makes sense to speak about probability distributions and random 
variables on Mk- This dual interpretation will be the main ingredient in our definition of 
genericity, and will allow us to relate algebraic results on genericity to the probabilistic 
setting in the applications. As Mk is a topological space, we may view any algebraic set in 
Mk as an event if we randomly choose a polynomial in Mk'- 

Definition B.l Let X be a random variable with values in Mk- Then an event for X 
is called algebraic event or algebraic property if the corresponding event set in Mk is an 
algebraic set. It is called irreducible if the corresponding event set in Mk is an irreducible 
algebraic set. 

If an event A is irreducible, this means that if we write A as the event U A\ and A^ , for 
algebraic events Ai,A2, then A = Ai, or A = A^. We now give some examples for algebraic 
properties. 

Example B.2 The following events on Mk are algebraic: 

1. The sure event. 

2. The empty event. 

3. The polynomial is of degree n or less. 

4. The polynomial vanishes on a prescribed algebraic set. 

5. The polynomial is contained in a prescribed ideal. 

6. The polynomial is homogenous. 

7. The polynomial is a square. 

8. The polynomial is reducible. 
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Properties 1-5 are additionally irreducible. 

We now show how to prove these claims: 1-2 are clear, we first prove that properties 
3-5 are algebraic and irreducible. By definition, it suffices to prove that the subset of Mk 
corresponding to those polynomials is an irreducible algebraic set. We claim: in any of 
those cases, the subset in question is moreover a linear subspace, and thus algebraic and 
irreducible. This can be easily verified by checking directly that if fx, fi fulfill the property 
in question, then f\ + a/2 also fulfills the property. 

Property 6 is algebraic, since it can be described as the disjunction of the properties 
"The polynomial is homogenous and of degree n" for all n < k. Those single properties 
can be described by linear subspaces of Mk as above, thus property 6 is parameterized by 
the union of those linear subspaces. In general, these are orthogonal, so property 6 is not 
irreducible. 

Property 7 is algebraic, as we can check it through the vanishing of a system of gener- 
alized discriminant polynomials. One can show that it is also irreducible since the subset 
of Mk in question corresponds to the image of a Veronese map (homogenization to degree 
k is a strategy); however, since we will not need such a result, we do not prove it here. 

Property 8 is algebraic, since factorization can also be checked by sets of equations. 
One has to be careful here though, since those equations depend on the degrees of the 
factors. For example, a polynomial of degree 4 may factor into two polynomials of degree 1 
and 3, or in two polynomials of degree 2 each. Since in general each possible combination 
defines different sets of equations and thus different algebraic subsets of Mk, property 8 is 
in general not irreducible (for k < 3 it is). 

The idea defining a choice of polynomial as generic follows the intuition of the affirmed 
non-sequitur: a generic, resp. generically chosen polynomial should not fulfill any algebraic 
property. A generic polynomial, having a particular simple (i.e. irreducible) algebraic prop- 
erty, should not fulfill any other algebraic property which is not logically implied by the first 
one. Here, algebraic properties are regarded as the natural model for restrictive and degen- 
erate conditions, while their logical negations are consequently interpreted as generic, as we 
have seen in Example |B.2 These considerations naturally lead to the following definition 



of genericity in a probabilistic context: 

Definition B.3 Let X be a random variable with values in Mk- Then X is called generic, 
if for any irreducible algebraic events A,B, the following holds: 

The conditional probability Px(A\B) exists and vanishes if and only if B does not imply 

A. 

In particular, B may also be the sure event. 

Note that without giving a further explication, the conditional probability Px(A\B) is 
not well-defined, since we condition on the event B which has probability zero. There is also 
no unique way of remedying this, as for example the Borel-Kolmogorov paradox shows. In 



section B.2| we will discuss the technical notion which we adopt to ensure well-definedness. 



Intuitively, our definition means that an event has probability zero to occur unless it is 
logically implied by the assumptions. That is, degenerate dependencies between events do 
not occur. 
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For example, non-degenerate multivariate Gaussian distributions or Gaussian mixture 
distributions on Ai^ are generic distributions. More general, any positive continuous prob- 
ability distribution which can be approximated by Gaussian mixtures is generic (see Ex- 
ample B.9). Thus we argue that non-generic random variables are very pathological cases. 
Note however, that our intention is primarily not to analyze the behavior of particular 
fixed generic random variables (this is part of classical statistics). Instead, we want to infer 
statements which follow not from the particular structure of the probability function, but 
solely from the fact that it is generic, as these statements are intrinsically implied by the 
conditional postulate in Definition B.3 alone. We will discuss the definition of genericity 



and its implications in more detail in section B.2 



With this definition, we can introduce the terminology of a generic object: it is a generic 
random variable which is object- valued. 



Definition B.4 We call a generic random variable with values in A4k a generic polyno- 
mial of degree k. When the degree k is arbitrary, but fixed (and still > 1), we will say that 
f is a generic polynomial, or that f is generic, if it is clear from the context that f is a 
polynomial. If the degree k is zero, we will analogously say that f is a generic constant. 



We call a set of constants or polynomials fi,...,f m generic if they are generic and 
independent. 



We call an ideal generic if it is generated by a set of m generic polynomials. 
We call an algebraic set generic if it is the vanishing set of a generic ideal. 



Let V be an algebraic property on a polynomial, a set of polynomials, an ideal, or an 
algebraic set (e.g. homogenous, contained in an ideal et.). We will call a polynomial, a set 
of polynomials, or an ideal, a generic V polynomial, set, or ideal, if it the conditional of a 
generic random variable with respect to V . 

If A is a statement about an object (polynomial, ideal etc), andV an algebraic property, 
we will say briefly "A generic V object is A" instead of saying "A generic V object is A 
with probability one". 

Note that formally, these objects are all polynomial, ideal, algebraic set etc -valued ran- 
dom variables. By convention, when we state something about a generic object, this will 
be an implicit probability-one statement. For example, when we say 



A generic green ideal is blue" , 



this is an abbreviation for the by definition equivalent but more lengthy statement 

"Let /i, . . . , f m be independent generic random variables with values in M. ^ , . . . , M.k m ■ 
If the ideal (/i, . . . , f m ) is green, then with probability one, it is also blue - this statement 
is independent of the choice of the ki and the choice of which particular generic random 
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variables we use to sample. 



On the other hand, we will use the verb "generic" also as a qualifier for "constituting 
generic distribution" . So for example, when we say 



"The Z of a generic red polynomial is a generic yellow polynomial" , 



this is an abbreviation of the statement 



"Let X be a generic random variable on Aik, let X' be the yellow conditional of X. 
Then the Z of X' is the red conditional of some generic random variable - in particular this 
statement is independent of the choice of k and the choice of X." 

It is important to note that the respective random variables will not be made explicit 
in the following subsections, since the statements will rely only on its property of being 
generic, and not on its particular structure which goes beyond being generic. 

As an application of these concepts, we may now formulate the problem of comparing 
cumulants in terms of generic algebra: 

Problem B.5 Let 5 = l(S), where S is an unknown d- dimensional subspace of G D . Let 

Z = (/lj ■ ■ • i fm) 

with fi G s generic of fixed degree each (in our case, one and two), such that y/X = s. 

Then determine a reduced Groebner basis (or another simple generating system) for s. 

As we will see, genericity is the right concept to model random sampling of polynomials, 
as we will derive special properties of the ideal I which follow from the genericity of the fi. 



B.2 Zero-measure conditionals, and relation to other types of genericity 



In this section, se will discuss the definition of genericity in Definition B.3 and ensure its 



well-definedness. Then we will invoke alternative definitions for genericity and show their 



relation to our probabilistic intuitive approach from section B.l As this section contains 
technical details and is not necessary for understanding the rest of the appendix, the reader 
may opt to skip it. 



An important concept in our definition of genericity in Definition |B.3 is the conditional 



probability Px(A\B). As B is an algebraic set, its probability Px(B) is zero, so the Bayesian 
definition of conditional cannot apply. There are several ways to make it well-defined; in the 



following, we explain the Definition of conditional we use in Definition B.3 The definition 
of conditional we use is one which is also often applied in this context. 

Remark B.6 Let X be a real random variable (e.g. with values in Mk) with probability 
measure /i. If fi is absolutely continuous, then by the theorem of Radon- Nikodym, there is 
a unique continuous density p such that 



H{U) = pdX 



u 
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for any E 'or 'el-measurable set U and the Lebesgue measure A. If we assume that p is a 
continuous function, it is unique, so we may define a restricted measure \±b on the event 
set of B by setting 

v(U) = [ pdH, 
Ju 

for Borel subsets of U and the Hausdorff measure H on B. If v{B) is finite and non-zero, 
i.e. v is absolutely continuous with respect to H, then it can be renormalized to yield a 
conditional probability measure /j>(.)\b = u {-) l u {B)- The conditional probability Px(A\B) 
has then to be understood as 



Px(A\B)= [ l(AnB)dn\ B , 

JB 



whose existence in particular implies that the Lebesgue integrals v(B) are all finite and 
non-zero. 

As stated, we adopt this as the definition of conditional probability for algebraic sets 
A and B. It is important to note that we have made implicit assumptions on the random 



variable X by using the conditionals Px(A\B) in Remark B.6 (and especially by assuming 



that they exist): namely, the existence of a continuous density function and existence, 



finiteness, and non-vanishing of the Lebesgue integrals. Similarly, by stating Definition B.3 
for genericity, we have made similar assumptions on the generic random variable X, which 
can be summarized as follows: 

Assumption B.7 X is an absolutely continuous random variable with continuous density 
function p, and for every algebraic event B, the Lebesgue integrals 



B 



pdH, 

where H is the Hausdorff measure on B, are non-zero and finite. 

This assumption implies the existence of all conditional probabilities Px(A\B) in Defi- 



nition B.3, and are also necessary in the sense that they are needed for the conditionals to 
be well-defined. On the other hand, if those assumptions are fulfilled for a random variable, 
it is automatically generic: 



Remark B.8 Let X be a J\A k -valued random variable, fulfilling the Assumptions in B.I, 
Then, the probability density function of X is strictly positive. Moreover, X is a generic 
random variable. 



Proof Let X be a .M /--valued random variable fulfilling the Assumptions in B.7 Let p be 
its continuous probability density function. 

We first show positivity: If X would not be strictly positive, then p would have a zero, 
say x. Taking B = {x}, the integral J B pdH vanishes, contradicting the assumption. 

Now we prove genericity, i.e. that for arbitrary irreducible algebraic properties A, B such 
that B does not imply A, the conditional probability Px(A\B) vanishes. Since B does not 
imply A, the algebraic set defined by B is not contained in A. Moreover, as B and A are 



37 



KlRALY, VON BUNAU, MEINECKE, BLYTHE AND MULLER 



irreducible and algebraic, A n B is also of positive codimension in B. Now by assumption, 
X has a positive continuous probability density function / which by assumption restricts 
to a probability density on B, being also positive and continuous. Thus the integral 



Px(A\B) = [ l A f(x)dH, 
Jb 



where H is the Hausdorff measure on B, exists. Moreover, it is zero, as we have derived 
that A has positive codimension in B. ■ 

This means that already under mild assumptions, which merely ensure well-definedness 



of the statement in the Definition B.3 of genericity, random variables are generic. The 
strongest of the comparably mild assumptions are the convergence of the conditional inte- 
grals, which allow us to renormalize the conditionals for all algebraic events. In the following 
example, a generic and a non-generic probability distribution are presented. 

Example B.9 Gaussian distributions and Gaussian mixture distributions are generic, since 
for any algebraic set B, we have 



/ l m dH = 0(t 
Jb 



dimB\ 



where B(t) = {x £ M n ; ||x|| < t} is the open disc with radius t. Note that this particular 
bound is false in general and may grow arbitrarily large when we omit B being algebraic, 
even if B is a smooth manifold. Thus Px(A\B) is bounded from above by an integral (or 
a sum) of the type 

roc 

/ exp(-t 2 )t a dt with a G N 
Jo 

which is known to be finite. 

Furthermore, sums of generic distributions are again generic; also, one can infer that any 
continuous probability density dominated by the distribution of a generic density defines 
again a generic distribution. 

An example of a non-generic but smooth distribution is given by the density function 

p{x,y) = jf e ~ x4y4 

where J\f is some normalizing factor. While p is integrable on M 2 , its restriction to the 
coordinate axes x = and y = is constant and thus not integrable. 

Now we will examine different known concepts of genericity and relate them briefly to 
the one we have adopted. 

A definition of genericity in combinatorics and geometry which can be encountered in 
different variations is that there exist no degenerate interpolating functions between the 
objects: 

Definition B.10 Let Pi, ... , P m be points in the vector space C n . Then P±, . . . , P m are 

general position (or generic, general) if no n + 1 points lie on a hyperplane. Or, in a 
stronger version: for any d £ N ; no (possibly inhomogenous) polynomial of degree d vanishes 



on 



(n+d 



) + 1 different Pi. 
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As TWfc is a finite dimensional C- vector space, this definition is in principle applicable 
to our situation. However, this definition is deterministic, as the Pi are fixed and no 
random variables, and thus preferable when making deterministic statements. Note that 
the stronger definition is equivalent to postulating general position for the points Pi, ... , P m 
in any polynomial kernel feature space. 

Since not lying on a hyperplane (or on a hypersurface of degree d) in C n is a non-trivial 
algebraic property for any point which is added beyond the n-th (resp. the ( n ^" d )-th) point 
Pi (interpreted as polynomial in A4k), our definition of genericity implies general position. 
This means that generic polynomials /i, . . . , f m & A4fc (almost surely) have the deterministic 
property of being in general position as stated in Definition |B.11| A converse is not true 
for two reasons: first, the Pi are fixed and no random variables. Second, even if one would 
define genericity in terms of random variables such that the hyperplane (resp. hypersurface) 
conditions are never fulfilled, there are no statements made on conditionals or algebraic 
properties other than containment in a hyperplane, also Lebesgue zero sets are not excluded 
from occuring with positive probability. 

Another example where genericity classically occurs is algebraic geometry, where it is 
defined rather general for moduli spaces. While the exact definition may depend on the 
situation or the particular moduli space in question, and is also not completely consistent, 
in most cases, genericity is defined as follows: general, or generic, properties are properties 
which hold on a Zariski-open subset of an (irreducible) variety, while very generic proper- 
ties hold on a countable intersection of Zariski-open subsets (which are thus paradoxically 
"less" generic than general resp. generic properties in the algebraic sense, as any general 
resp. generic property is very generic, but the converse is not necessarily true). In our spe- 
cial situation, which is the affine parameter space of tuples of polynomials, these definitions 
can be rephrased as follows: 

Definition B.ll Let B C C k be an irreducible algebraic set, let P = (f\, . . . , f m ) be a tuple 
of polynomials, viewed as a point in the parameter space B. Then a statement resp. property 
A of P is called very generic if it holds on the complement of some countable union of 
algebraic sets in B. A statement resp. property A of P is called general (or generic) if it 
holds on the complement of some finite union of algebraic sets in B. 

This definition is more or less equivalent to our own; however, our definition adds the prac- 
tical interpretation of generic/very generic/general properties being true with probability 
one, while their negations are subsequently true with probability zero. In more detail, the 
correspondence is as follows: If we restrict ourselves only to algebraic properties A, it is 
equivalent to say that the property A is very generic, or general for the P in B, and to say 
with our original definition that a generic P fulfilling B is also A; since if A is by assumption 
an algebraic property, it is both an algebraic set and a complement of a finite (countable) 
union of algebraic sets in an irreducible algebraic set, so A must be equal to an irreducible 
component of B; since B is irreducible, this implies equality of A and B. On the other hand, 
if A is an algebraic property, it is equivalent to say that the property not-^4 is very generic, 
or general for the P in B, and to say with our original definition that a generic P fulfilling B 
is not A - this corresponds intuitively to the probability-zero condition P{A\B) = which 
states that non- generic cases do not occur. Note that by assumption, not- A is then always 
the complement of a finite union of algebraic sets. 
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B.3 Arithmetic of generic polynomials 

In this subsection, we study how generic polynomials behave under classical operations in 
rings and ideals. This will become important later when we study generic polynomials and 
ideals. 

To introduce the reader to our notation of genericity, and since we will use the presented 
facts and similar notations implicitly later, we prove the following 

Lemma B.12 Let f € C[Xi, . . . , Xd] be generic of degrees k. Then: 

(i) The product af is generic of degree k for any fixed a £ C \ {0}. 

(ii) The sum f+g is generic of degree k for any g € C[.X~i, . . . , Xd] of degree k or smaller. 
(hi) The sum f + g is generic of degree k for any generic g 6 C[Xx, . . . , Xd] of degree k 

or smaller. 

Proof (i) is clear since the coefficients of g\ are multiplied only by a constant, (ii) follows 
directly from the definitions since adding a constant g only shifts the coefficients without 
changing genericity. (iii) follows since /, g are independently sampled: if there were alge- 
braic dependencies between the coefficients of / + g, then either / or g was not generic, or 
the /, g are not independent, which both would be a contradiction to the assumption. ■ 



Recall again what this Lemma means: for example, Lemma B.12| (i) does not say, as 
one could think: 

"Let X be a generic random variable with values in the vector space of degree k poly- 
nomials. Then X = aX for any a 6 C \ {0}." 



The correct translation of Lemma B.12 (i) is: 



"Let X be a generic random variable with values in the vector space of degree k poly- 
nomials. Then X' = aX for any fixed a E C \ {0} is a generic random variable with values 
in the vector space of degree k polynomials" 



The other statements in Lemma B.12 have to be interpreted similarly. 



The following remark states how genericity translates through dehomogenization: 

Lemma B.13 Let f € C[Xi, . . . ,Xd] be a generic homogenous polynomial of degree d. 
Then the dehomogenization f(X\, . . . ,Xd—i, 1) is a generic polynomial of degree d in the 
polynomial ring G[X\, . . . , Xd-i]- 

Similarly, let s < C[-Xi, . . . , Xq] be a generic homogenous ideal. Let f G s be a generic 
homogenous polynomial of degree d. 

Then the dehomogenization f(Xi, . . . ,Xjj—i, 1) is a generic polynomial of degree d in the 
dehomogenization of 5. 
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Proof For the first statement, it suffices to note that the coefficients of a homogenous 
polynomial of degree d in the variables Xi, . . . , Xr> are in bijection with the coefficients of a 
polynomial of degree d in the variables X±, . . . , Xd-i by dehomogenization. For the second 
part, recall that the dehomogenization of s consists exactly of the dehomogenizations of 
elements in s. In particular, note that the homogenous elements of s of degree d are in 
bijection to the elements of degree d in the dehomogenization of s. The claims then follows 
from the definition of genericity. ■ 



B.4 Generic spans and generic height theorem 

In this subsection, we will derive the first results on generic ideals. We will derive an 
statement about spans of generic polynomials, and generic versions of Krull's principal 
ideal and height theorems which will be the main tool in controlling the structure of generic 
ideals. This has immediate applications for the cumulant comparison problem. 

Now we present the first result which can be easily formulated in terms of genericity: 

Proposition B.14 Let P be an algebraic property such that the polynomials with property 
P form a vector space V . Let fi, ■ ■ ■ , f m £ C[Xl, . . . Xjj] be generic polynomials satisfying 



Proof It suffices to prove: if i < M, then fi is linearly independent from fi, ■ ■ ■ fi-i with 
probability one. Assuming the contrary would mean that for some i, we have 

i-1 



thus giving several equations on the coefficients of fi . But these are fulfilled with probability 



This may be seen as a straightforward generalization of the statement: the span of n 
generic points in G D has dimension min(n, D). 

We now proceed to another nontrivial result which will now allow us to formulate a 
generic version of Krull's principal ideal theorem: 

Proposition B. 15 Let Z C G D be a non-empty algebraic set, let f £ G[Xi,...Xr>] 
generic. Then f is no zero divisor in O(Z) = G[Xi, . . . Xp]/ 1(Z). 

Proof We claim: being a zero divisor in 0{Z) is an irreducible algebraic property. We 
will prove that the zero divisors in O(Z) form a linear subspace of Mk, and linear spaces 
are irreducible. 

For this, one checks that sums and scalar multiples of zero divisors are also zero divisors: 
if gi,g 2 are zero divisors, there must exist h±,h 2 such that g\ h\ = g 2 h 2 = 0. Now for any 
a € C, we have that 



P. Then 



rank span(/i , . . . , f m ) = min(m, dim V) . 




for some c& € C, 



zero by the genericity assumption, so the claim follows. 



(91 + ag 2 )(hih 2 ) = (gih 1 )h 2 + (g2h 2 )ahi = 0. 
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This proves that (gi -\-ag2) is also a zero divisor, proving that the zero divisors form a linear 
subspace and thus an irreducible algebraic property. 

To apply the genericity assumption to argue that this event occurs with probability zero, 
we must exclude the possibility that being a zero divisor is trivial, i.e. always the case. This 
is equivalent to proving that the linear subspace has positive codimension, which is true if 
and only if there exists a non-zero divisor in O(Z). But a non-zero divisor always exists 
since we have assumed Z is non-empty: thus I(Z) is a proper ideal, and O(Z) contains C, 
which contains a non-zero divisor, e.g. the one. 

So by the genericity assumption, the event that / is a zero divisor occurs with probability 
zero, i.e. a generic / is not a zero divisor. Note that this does not depend on the degree of 
/• ■ 



Note that this result is already known, compare Conjecture B in (Pardue. 2010). 



A straightforward generalization using the same proof technique is given by the following 

Corollary B.16 Let I < G[X\, . . . ,Xu], let P be a non-trivial algebraic property. Let 
f € C[Xi, . . .Xo] be a generic polynomial with property P. If one can write f = f + c, 
where f is a generic polynomial subject to some property P' , and c is a generic constant, 
then f is no zero divisor in C[X\, . . . , Xd]/I. 

Proof First note that / is a zero divisor in C[Xi, . . . ,Xd]/I if and only if / is a zero 
divisor in C[Xi, . . . , Xjj]/^/X. This allows us to reduce to the case that X = I(Z) for some 
algebraic set Z C <C D . 



Now, as in the proof of Proposition B.15, we see that being a zero divisor in O(Z) is an 
irreducible algebraic property and corresponds to a linear subspace of Aik, where k = deg /. 
The zero divisors with property P are thus contained in this linear subspace. Now let / 
be generic with property P as above. By assumption, we may write / = /'-(- c. But c is 
(generically) no zero divisor, so / is also not a zero divisor, since the zero divisors form a 
linear subspace of Mk- Thus / is no zero divisor. This proves the claim. ■ 



since we can 



Note that Proposition B.15 is actually a special case of Corollary B.16 
write any generic polynomial / as /' + c, where /' is generic of the same degree, and c is a 
generic constant. 

The major tool to deal with the dimension of generic intersections is Krull's principal 
ideal theorem: 

Theorem B.17 (Krull's principal ideal theorem) Let R be a commutative ring with 
unit, let f E R be non-zero and non-invertible. Then 

ht(/) < 1, 

with equality if and only if f is not a zero divisor in R. 
The reader unfamiliar with height theory may take 

ht 1= codimV(X) 

as the definition for the height of an ideal (caveat: codimension has to be taken in R). 
Reformulated geometrically for our situation, Krull's principal ideal theorem implies: 
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Corollary B.18 Let Z be a non-empty algebraic set in <C D .Then 

codim(Z n V(/)) < codim Z + 1. 
Proof Apply KrulPs principal ideal theorem to the ring R = 0{Z) = G[Xi, . . . , Xd\/\{Z). 



Together with Proposition B.15 , one gets a generic version of Krull's principal ideal theorem: 



Theorem B.19 (Generic principal ideal theorem) Let Z be a non-empty algebraic set 
in G D , let R = O(Z), and let f G C[A"i, . . . , Xd} be generic. Then we have 

ht(/) = 1. 

In its geometric formulation, we obtain the following result. 

Corollary B.20 Consider an algebraic set Z C & D , and the algebraic set V(/) for some 
generic f 6 C[Xi, . . . , Xd]- Then 

codim(Zn V(/)) = min(codimZ + 1, D + 1). 



Proof This is just a direct reformulation of Theorem B.19 in the vein of Corollary B.18 
The only additional thing that has to be checked is the case where codim Z = D + 1, which 
means that Z is the empty set. In this case, the equality is straightforward. ■ 



The generic version of the principal ideal theorem straightforwardly generalizes to a 
generic version of Krull's height theorem. We first mention the original version: 

Theorem B.21 (Krull's height theorem) Let R be a commutative ring with unit, let 
I = (fx, . . . , f m ) <! R be an ideal. Then 

htX < m, 

with equality if and only if fi, . . . , f m is an R-regular sequence, i.e. fi is not invertible and 
not a zero divisor in the ring R/{fi, . . . , fi~i) for all i. 

The generic version can be derived directly from the generic principal ideal theorem: 

Theorem B.22 (Generic height theorem) Let Z be an algebraic set in € D , let T = 

(fx, . . . , f m ) be a generic ideal in G[Xx, • • • , X^]. Then 

ht(I(Z) + X) = min(codim Z + m, D + l). 

Proof We will write R = O(Z) for abbreviation. 

First assume m < D + 1 — codim Z. It suffices to show that fi, ■ ■ ■ , f m forms an ir- 
regular sequence, then apply Krull's height theorem. In Proposition |B.15} we have proved 
that fi is not a zero divisor in the ring 0(Z n V(/i, . . . , fi-x)) (note that the latter ring 
is nonzero by Krull's height theorem). By Hilbert's Nullstellensatz, this is the same as 
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the ring R/ \J (fi, . . . , fi-i). But by the definition of radical, this implies that fi is no zero 
divisor in the ring R/(fi, . . . , fi-i), since if fi ■ h = in the first ring, we have 

(fi ■ hf = n ■ vf-W) = o 

in the second. Thus the fi form an i?-regular sequence, proving the theorem for the case 
m < D + 1 — codim Z. 

If now m > k := D + 1 — codim Z, the above reasoning shows that the radical of 
l(Z) + (fi, . . . , fk) is the module (1), which means that those are equal. Thus 

l(Z) + (/!,..., fk) = KZ) + (fl,...,f m ) = (1), 

proving the theorem. 

Note that we could have proved the generic height theorem also directly from the generic 
principal ideal theorem by induction. ■ 

Again, we give the geometric interpretation of Krull's height theorem: 

Corollary B.23 Let Z\ be an algebraic set in C D , let Z2 be a generic algebraic set in C D . 
Then one has 

codim(Zi Pi Z2) = min(codim Z\ + codim Z2, D + 1). 



Proof This follows directly from two applications of the generic height theorem B.22 first 
for Z = <C D and Z2 = V(X), showing that codim Z2 is equal to the number m of generators 
of I; then, for Z = Z\ and Z2 = V(X), and substituting m = codim Z2. ■ 



We can now immediately formulate a homogenous version of Proposition B.23 



Corollary B.24 Let Z\ be a homogenous algebraic set in C D , let Z2 be a generic homoge- 
nous algebraic set in G D . Then one has 

codim(Zi Pi Z2) = min(codimZi + codim Z2, D). 

Proof Note that homogenization and dehomogenization of a non-empty algebraic set do 
not change its codimension, and homogenous algebraic sets always contain the origin. Also, 



one has to note that by Lemma B.13, the dehomogenization of Z2 is a generic algebraic set 
in C^- 1 . ■ 



Finally, using Corollary B.16, we want to give a more technical variant of the generic 
height theorem, which will be of use in later proofs. First, we introduce some abbreviating 
notations: 

Definition B.25 Let f G C[Ai, . . . Xp] be a generic polynomial with property P. If one 
can write f = f + c, where f is a generic polynomial subject to some property P 1 , and 
c is a generic constant, we say that f has independent constant term. If c is generic and 
independent with respect to some collection of generic objects, we say that f has independent 
constant term with respect to that collection. 
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In this terminology, Corollary B.16| rephrases as: a generic polynomial with independent 
constant term is no zero divisor. Using this, we can now formulate the corresponding variant 
of the generic height theorem: 

Lemma B.26 Let Z be an algebraic set in C D . Let fi,...,f m £ C[.X~i, . . . , Xd] be generic, 
possibly subject to some algebraic properties, such that fi has independent constant term 
with respect to Z and fx, . . . , fi-i. Then 

ht(I(Z) + J) = min(codimZ + m, D + l). 



Proof Using Corollary B.16, one obtains that fi is no zero divisor modulo I(Z)+(/i, . . . , fi+i) 
Using Krull's height theorem yields the claim. ■ 



B.5 Generic ideals 



The generic height theorem B.22 has allowed us to make statements about the structure 
of ideals generated by generic elements without constraints. However, the ideal X in our 
the cumulant comparison problem is generic subject to constraints: namely, its generators 
are contained in a prescribed ideal, and they are homogenous. In this subsection, we will 
use the theory developed so far to study generic ideals and generic ideals subject to some 
algebraic properties, e.g. generic ideals contained in other ideals. We will use these results 
to derive an identifiability result on the marginalization problem which has been derived 
already less rigourously in the supplementary material of ( von Biinau et al. 2009 ) for the 
special case of Stationary Subspace Analysis. 

Proposition B.27 Let s < C[Xi, . . . , Xd] be an ideal, having an H-basis g±, . . . , g n . Let 

X= (/i,...,/ m ), m > max(L> + l,n) 
with generic fi £ 5 such that 

deg/j > max(deggj) for all I < i < m. 

j 

Then 1 = 5. 

Proof First note that since the gi form a degree-first Groebner basis, a generic / G s is of 
the form 

n 

f = s ^g k h k with generic h k , 

k=l 

where the degrees of the h k are appropriately chosen, i.e. degh k < degf — degg k . 
So we may write 

n 

fi = ^2 9k h ki with generic h ki , 

k=l 

where the h ki are generic with appropriate degrees, and independently chosen. We may 
also assume that the fi are ordered increasingly by degree. 
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To prove the statement, it suffices to show that gj £ I for all j. Now the height 



theorem B.22 implies that 



(/ill, • • • him) = (1), 



since the h^i were independently generic, and m > D + l. In particular, there exist polyno- 
mials si, . . . , s m such that 



} ] Sjhu = 1. 



i=l 



Thus we have that 



m n 



n m 



^ Sj/i = Sj Okhki = / J 9k/ J Sih k i 



i=l 



i=l k=l 
n m 



k=l i=l 

n 



gi + ^2 9k ^2 S{hki =: 91 + ^2 9kh 'k- 

k=2 i=l k=2 



Subtracting a suitable multiple of this element from the /i, . . . , f m , we obtain 

n n 
fi =J2 9k ( hki ~ hlih 'k) =: J2 9kH ' k i- 



k=2 



k=2 



We may now consider huh' k as fixed, while the hki are generic. In particular, the h' ki have 
independent constant term, and using Lemma B.26[ we may conclude that 

(h' 21 ,...,h' 2m ) = (1), 

allowing us to find an element of the form 

n 

92 + 9k ■ ■ ■ ■ 

k=3 



in X. Iterating this strategy by repeatedly applying Lemma B.26 we see that gp. is contained 
in I, because the ideals I and s have same height. Since the numbering for the gj was 
arbitrary, we have proved that gj £ I, and thus the proposition. ■ 

The following example shows that we may not take the degrees of the fi completely arbitrary 
in the proposition, i.e. the condition on the degrees is necessary: 



Example B.28 Keep the notations of Proposition B.27 Let s = (X 2 — Xf, X3), and fi € 5 
generic of degree one. Then 

(fl,...,fm) = (X 3 ). 

This example can be generalized to yield arbitrarily bad results if the condition on the 
degrees is not fulfilled. 

However note that when s is generated by linear forms, as in the marginalization prob- 
lem, the condition on the degrees vanishes. 



We may use Proposition B.27 also in another way to derive a more detailed version of 
the generic height theorem for constrained ideals: 
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Proposition B.29 Let V be a fixed d-codimensional algebraic set in C D . Assume that 
there exist d generators gi, ■ ■ ■ ,gd for I(V). Let fi, ■ ■ ■ , f m be generic forms in I(V) such 
that deg fi > deg gi for 1 < i < min(m, d). Then we can write V(/i, . . . , f m ) = V U U with 
U an algebraic set of 

codim U > min(m, D + 1), 
the equality being strict for m < codim V. 



Proof If m > D + 1, this is just a direct consequence of Proposition B.27 



First assume m = d. Consider the image of the situation modulo X r , 
corresponds to looking at the situation 



,X D . This 



m—l 



where H is the linear subspace given by X n 



Xd = 0. Since the coordinate system 



was generic, the images of the /, will be generic, and we have by Proposition B.27 that 
V(/i, . . . , f m ) fi H = V D H. Also, the H can be regarded as a generic linear subspace, 
thus by Corollary B.23 we see that V(/i, . . . , f m ) consists of V and possibly components 
of equal or higher codimension. This proves the claim for m = codim V. 

Now we prove the case m > d. We may assume that m = D + 1 and then prove the 
statement for the sets V(/i, . . . , fi),d < i < m. By the Lasker-Noether-Theorem, we may 
write 

V(fi,...,fd) = VUZ 1 U---UZ N 



for finitely many irreducible components Zj with codim Zj > codim V. Proposition B.27 
now states that 

V(h,..-,fm) = V. 

For i> d, write now 

Zji = Zj n v(/i, ...,f i ) = z j n V(f d+ i, ...,h). 

With this, we have the equalities 

v(/i, ...,fi) = V(h, . . . , u) n v(/ d+1 , ...,/,) 

= v u (Zi n v(/ d+1 , . . . , f^) u • • • u (z N n v(/ d+1 , 
= vuz li u---uz Ni . 



Ji)) 



for i > d. Thus, reformulated, Proposition B.27 states that Zj m = for any j. We can now 
infer by Krull's principal ideal theorem B.17 that 



codim Zji < codim Z 



+ 1 



for any But since codim Zj m = D + 1, and codim Zjd > d, we thus may infer that 
codim Zji > i for any d < i < m. Thus we may write 

V(/i, ...J l ) = VUU with u = z u u---uz m 

with codim U > i, which proves the claim for m > codim V. 
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The case m < codim!/ can be proved again similarly by Krull's principal ideal theo- 
rem B.17 it states that the codimension of V(/i, . . . , fi) increases at most by one with each 
i, and we have seen above that it is equal to codim V for i = codim V. Thus the codimension 
of V(/i, . . . , fi) must have been i for every i < codim V. This yields the claim. ■ 

Note that depending on V and the degrees of the fi, it may happen that even in the generic 



case, the equality in Proposition B.29 is not strict for m > codim V: 



Example B.30 Let V be a generic linear subspace of dimension d in <C D , let f\, . . . , f m G 
I(V) be generic with degree one. Then V(/i, . . . , f m ) is a generic linear subspace of dimen- 
sion max(D — m, d) containing V. In particular, if m > D — d, then V(/i, . . . , f m ) = V. 
In this example, U = V(/i, . . . , f m ), if m < codim V, with codimension m, and U = 0, if 
m > codim V, with codimension D + 1. 

Similarly, one may construct generic examples with arbitrary behavior for codim U when 
m > codim V, by choosing V and the degrees of fi appropriately. 

Similarly as in the geometric version for the height theorem, we may derive the following 
geometric interpretation of this result: 

Corollary B.31 Let V C Z\ be fixed algebraic sets in <C D . Let Z2 be a generic algebraic 
set in <C D containing V. Then 

codim(Zi n Z 2 \ V) > min(codim(Zi \ V) + codim(Z 2 \ V), D + 1). 

Informally, we have derived a height theorem type result for algebraic sets under the con- 
straint that they contain another prescribed algebraic set V. 



We also want to give a homogenous version of Proposition B.29, since the ideals in the 
paper are generated by homogenous forms: 



Corollary B.32 Let V be a fixed homogenous algebraic set in C D . Let fi, ■ ■ ■ , f m be generic 



homogenous forms in I(V), satisfying the degree condition as in Proposition B.29. Then 
V(/i, . . . , f m ) = V + U with U an algebraic set fulfilling 

codim U > min(m, D). 

In particular, if m > D, then V(/i, . . . , f m ) = V. Also, the maximal dimensional part of 
V(/i, . . . , f m ) equals V if and only if m> D — dim V. 



Proof This follows immediately by dehomogenizing, applying Proposition B.29 and ho- 
mogenizing again. ■ 

From this Corollary, we now can directly derive a statement on the necessary number of 
epochs for the identifiability of the projection making several random variables appear iden- 
tical. For the convenience of the reader, we recall the setting and then explain what iden- 
tifiability means. The problem we consider in the main part of the paper can be described 
as follows: 
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Problem B.33 Let X\, . . . ,X m be smooth random variables, let 

qi = [Ti, ...,T D }o (K 2 (Xi) - K 2 (X m )) , 1 < i < m - 1 

and 

fi = [Ti, . . . ,T D ] o (Ki(Xj) - Ki(X m )) , 1 < i < m- 1 

be the corresponding polynomials in the formal variables T\, . . . ,Tq. What can one say 
about the set 

S = V(<7i, . . . , q m -l,hi ■ ■ ■ )/m-l)- 

If there is a linear subspace S on which the cumulants agree, then the qi, fi vanish on S. If 
we assume that this happens generically, the problem reformulates to 

Problem B.34 Let S be a d-dimensional linear subspace of <C D , let s = 1(5"), and let 
/l,...,/jv be generic homogenous quadratic or linear polynomials in s. How does S' = 
\(h,...,f N ) relate to S?. 

Before giving bounds on the identifiability, we first begin with a direct consequence of 



Corollary B.32 



Remark B.35 The highest dimensional part of S' = V(/i, . . . , /at) is S if and only if 

N > D-d. 

For this, remark that I(S') is generated in degree one, and thus the degree condition in 



Corollary B.32 becomes empty. 



We can now also get an identifiability result for S: 

Proposition B.36 Let /i, • • • ,/jv be generic homogenous polynomials of degree one or two, 
vanishing on a linear space S of dimension d > 0. Then S is identifiable from the fi alone 
if 

N > D-d + 1. 

Moreover, if all fi are quadrics, then S is identifiable from the fi alone only if 

N>2. 



Proof Note that the fx, . . . , fa are generic polynomials contained in s := l(S). 



First assume N > D — d + 1. We prove that S is identifiable: using Corollary |B. 32 



one 



sees now that the common vanishing set of the fi is S up to possible additional components 
of dimension d — 1 or less. I.e. the radical of the ideal generated by the fi has a prime 
decomposition 

V(fiT--JN) = s n pi n • • • n p k , 

where the pi are of dimension d — 1 or less, while s has dimension d. So one can use one 
of the existing algorithms calculating primary decomposition to identify s as the unique 
component of the highest dimensional part, which proves identifiability it N >D — d+1. 
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Now we prove the only if part: assume that N = 1, i.e. we have only a single f\. Since 
fi is generic with the property of vanishing on S, we have 

D-d 
i=i 

where gi, . . . , gr>-d is some homogenous linear generating set for 1(5), and hi, ... , h£>_ c i are 
generic homogenous linear forms. Thus, the zero set V(/i) also contains the linear space 
S' = V(/ii, . . . , which is a generic d-dimensional linear space in C D and thus differ- 

ent from S; no algorithm can decide whether S or S' is the correct solution, so S is not 
identifiable. ■ 



Note that there is no obvious reason for the lower bound N > D — d + 1 given in 
Proposition B.36 to be strict. While it is most probably the best possible bound which is in 
D and d, in general it may happen that S can be reconstructed from the ideal (/i, . . . , /at) 
directly. The reason for this is that a generic homogenous variety of high enough degree 
and dimension does not need to contain a linear subspace of fixed dimension d in general. 
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